TurnTrout's shortform feed

post by TurnTrout · 2019-06-30T18:56:49.775Z · LW · GW · 193 comments


Comments sorted by top scores.

comment by TurnTrout · 2019-06-30T18:57:46.543Z · LW(p) · GW(p)

comment by TurnTrout · 2020-06-29T00:46:47.566Z · LW(p) · GW(p)

For the last two years, typing for 5+ minutes hurt my wrists. I tried a lot of things: shots, physical therapy, trigger-point therapy, acupuncture, massage tools, wrist and elbow braces at night, exercises, stretches. Sometimes it got better. Sometimes it got worse.

No Beat Saber, no lifting weights, and every time I read a damn book I would start translating the punctuation into Dragon NaturallySpeaking syntax.

Text: "Consider a bijection "

My mental narrator: "Cap consider a bijection space dollar foxtrot colon cap x backslash tango oscar cap y dollar"

Have you ever tried dictating a math paper in LaTeX? Or dictating code? Telling your computer "click" and waiting a few seconds while resisting the temptation to just grab the mouse? Dictating your way through a computer science PhD?

And then.... and then, a month ago, I got fed up. What if it was all just in my head, at this point? I'm only 25. This is ridiculous. How can it possibly take me this long to heal such a minor injury?

I wanted my hands back - I wanted it real bad. I wanted it so bad that I did something dirty: I made myself believe something. Well, actually, I pretended to be a person who really, really believed his hands were fine and healing and the pain was all psychosomatic.

And... it worked, as far as I can tell. It totally worked. I haven't dictated in over three weeks. I play Beat Saber as much as I please. I type for hours and hours a day with only the faintest traces of discomfort.


Replies from: DanielFilan, vanessa-kosoy, Teerth Aloke, steve2152, avturchin, Raemon
comment by DanielFilan · 2020-09-11T23:09:16.052Z · LW(p) · GW(p)

Is the problem still gone?

Replies from: TurnTrout, TurnTrout
comment by TurnTrout · 2021-01-23T15:31:18.983Z · LW(p) · GW(p)

Still gone. I'm now sleeping without wrist braces and doing intense daily exercise, like bicep curls and pushups.

comment by TurnTrout · 2020-09-12T02:40:12.239Z · LW(p) · GW(p)

Totally 100% gone. Sometimes I go weeks forgetting that pain was ever part of my life. 

comment by Vanessa Kosoy (vanessa-kosoy) · 2020-06-29T12:12:17.028Z · LW(p) · GW(p)

I'm glad it worked :) It's not that surprising given that pain is known to be susceptible to the placebo effect. I would link the SSC post, but, alas...

Replies from: raj-thimmiah
comment by Raj Thimmiah (raj-thimmiah) · 2021-03-27T02:04:00.372Z · LW(p) · GW(p)

You able to link to it now?

comment by Teerth Aloke · 2020-06-29T01:46:42.987Z · LW(p) · GW(p)

This is unlike anything I have heard!

Replies from: mingyuan
comment by mingyuan · 2020-06-29T01:54:14.151Z · LW(p) · GW(p)

It's very similar to what John Sarno (author of Healing Back Pain and The Mindbody Prescription) preaches, as well as Howard Schubiner. There's also a rationalist-adjacent dude who started a company (Axy Health) based on these principles. Fuck if I know how any of it works though, and it doesn't work for everyone. Congrats though TurnTrout!

Replies from: Teerth Aloke
comment by Teerth Aloke · 2020-06-29T03:52:52.824Z · LW(p) · GW(p)

My Dad it seems might have psychosomatic stomach ache. How to convince him to convince himself that he has no problem?

Replies from: mingyuan
comment by mingyuan · 2020-06-29T04:52:34.336Z · LW(p) · GW(p)

If you want to try out the hypothesis, I recommend that he (or you, if he's not receptive to it) read Sarno's book. I want to reiterate that it does not work in every situation, but you're welcome to take a look.

comment by Steven Byrnes (steve2152) · 2021-01-23T20:03:32.860Z · LW(p) · GW(p)

Me too! [LW(p) · GW(p)]

Replies from: TurnTrout
comment by TurnTrout · 2021-01-23T20:11:40.644Z · LW(p) · GW(p)

There's a reasonable chance that my overcoming RSI was causally downstream of that exact comment of yours.

Replies from: steve2152
comment by Steven Byrnes (steve2152) · 2021-01-23T20:33:45.220Z · LW(p) · GW(p)

Happy to have (maybe) helped! :-)

comment by avturchin · 2020-06-29T10:46:34.826Z · LW(p) · GW(p)

Looks like reverse stigmata effect.

comment by Raemon · 2020-06-29T02:34:23.704Z · LW(p) · GW(p)

Woo faith healing! 

(hope this works out longterm, and doesn't turn out be secretly hurting still) 

Replies from: TurnTrout
comment by TurnTrout · 2020-06-29T03:16:21.709Z · LW(p) · GW(p)

aren't we all secretly hurting still?

Replies from: mingyuan
comment by mingyuan · 2020-06-29T04:54:01.028Z · LW(p) · GW(p)


comment by TurnTrout · 2019-12-17T06:37:41.969Z · LW(p) · GW(p)

My maternal grandfather was the scientist in my family. I was young enough that my brain hadn't decided to start doing its job yet, so my memories with him are scattered and inconsistent and hard to retrieve. But there's no way that I could forget all of the dumb jokes he made; how we'd play Scrabble and he'd (almost surely) pretend to lose to me [? · GW]; how, every time he got to see me, his eyes would light up with boyish joy.

My greatest regret took place in the summer of 2007. My family celebrated the first day of the school year at an all-you-can-eat buffet, delicious food stacked high as the eye could fathom under lights of green, red, and blue. After a particularly savory meal, we made to leave the surrounding mall. My grandfather asked me to walk with him.

I was a child who thought to avoid being seen too close to uncool adults. I wasn't thinking. I wasn't thinking about hearing the cracking sound of his skull against the ground. I wasn't thinking about turning to see his poorly congealed blood flowing from his forehead out onto the floor. I wasn't thinking I would nervously watch him bleed for long minutes while shielding my seven-year-old brother from the sight. I wasn't thinking that I should go visit him in the hospital, because that would be scary. I wasn't thinking he would die of a stroke the next day.

I wasn't thinking the last thing I would ever say to him would be "no[, I won't walk with you]".

Who could think about that? No, that was not a foreseeable mistake. Rather, I wasn't thinking about how precious and short my time with him was. I wasn't appreciating how fragile my loved ones are. I didn't realize that something as inconsequential as an unidentified ramp in a shopping mall was allowed to kill my grandfather.

I miss you, Joseph Matt.

Replies from: TurnTrout, Raemon, habryka4
comment by TurnTrout · 2019-12-17T21:34:08.992Z · LW(p) · GW(p)

My mother told me my memory was indeed faulty. He never asked me to walk with him; instead, he asked me to hug him during dinner. I said I'd hug him "tomorrow".

But I did, apparently, want to see him in the hospital; it was my mother and grandmother who decided I shouldn't see him in that state.

comment by Raemon · 2019-12-17T22:44:45.087Z · LW(p) · GW(p)


comment by habryka (habryka4) · 2019-12-17T18:44:48.154Z · LW(p) · GW(p)

Thank you for sharing.

comment by TurnTrout · 2020-12-17T19:49:42.125Z · LW(p) · GW(p)

Earlier today, I was preparing for an interview. I warmed up by replying stream-of-consciousness to imaginary questions I thought they might ask. Seemed worth putting here.

What do you think about AI timelines?

I’ve obviously got a lot of uncertainty. I’ve got a bimodal distribution, binning into “DL is basically sufficient and we need at most 1 big new insight to get to AGI” and “we need more than 1 big insight”

So the first bin has most of the probability in the 10-20 years from now, and the second is more like 45-80 years, with positive skew. 

Some things driving my uncertainty are, well, a lot. One thing  that drives how things turn out (but not really  how fast we’ll get there) is: will we be able to tell we’re close 3+ years in advance, and if so, how quickly will the labs react? Gwern Branwen made a point a few months ago, which is like, OAI has really been validated on this scaling hypothesis, and no one else is really betting big on it because they’re stubborn/incentives/etc, despite the amazing progress from scaling. If that’s true, then even if it's getting pretty clear that one approach is working better, we might see a slower pivot and have a more unipolar scenario. 

I feel dissatisfied with pontificating like this, though, because there are so many considerations pulling so many different ways. I think one of the best things we can do right now is to identify key considerations. There was work on expert models that showed that training simple featurized linear models often beat domain experts, quite soundly. It turned out that most of the work the experts did was locating the right features, and not necessarily assigning very good weights to those features.

So one key consideration I recently read, IMO, was Evan Hubinger talking about how homogeneity of AI systems: if they’re all pretty similarly structured, they’re plausibly roughly equally aligned, which would really decrease the probability of aligned vs unaligned AGIs duking it out.

What do you think the alignment community is getting wrong?

When I started thinking about alignment, I had this deep respect for everything ever written, like I thought the people were so smart (which they generally are) and the content was polished and thoroughly viewed through many different frames (which it wasn’t/isn’t). I think the field is still young enough that: in our research, we should be executing higher-variance cognitive moves, trying things and breaking things and coming up with new frames. Think about ideas from new perspectives.

I think right now, a lot of people are really optimizing for legibility and defensibility. I think I do that more than I want/should. Usually the “non-defensibility” stage lasts the first 1-2 months on a new paper, and then you have to defend thoughts. This can make sense for individuals, and it should be short some of the time, but as a population I wish defensibility weren’t as big of a deal for people / me. MIRI might be better at avoiding this issue, but a not-really-defensible intuition I have is that they’re freer in thought, but within the MIRI paradigm, if that makes sense. Maybe that opinion would change if I talked with them more.

Anyways, I think many of the people who do the best work aren’t optimizing for this.

comment by TurnTrout · 2020-04-26T22:24:15.587Z · LW(p) · GW(p)

If you want to read Euclid's Elements, look at this absolutely gorgeous online rendition:

Replies from: Benito, william-walker
comment by TurnTrout · 2021-05-01T13:47:52.759Z · LW(p) · GW(p)

Comment #1000 on LessWrong :)

Replies from: niplav
comment by niplav · 2021-05-01T19:39:25.607Z · LW(p) · GW(p)

With 5999 karma!

Edit: Now 6000 – I weak-upvoted an old post of yours [LW · GW] I hadn't upvoted before.

comment by TurnTrout · 2020-02-12T01:51:02.670Z · LW(p) · GW(p)

For quite some time, I've disliked wearing glasses. However, my eyes are sensitive, so I dismissed the possibility of contacts.

Over break, I realized I could still learn to use contacts, it would just take me longer. Sure enough, it took me an hour and five minutes to put in my first contact, and I couldn't get it out on my own. An hour of practice later, I put in a contact on my first try, and took it out a few seconds later. I'm very happily wearing contacts right now, as a matter of fact.

I'd suffered glasses for over fifteen years because of a cached decision – because I didn't think to rethink something literally right in front of my face every single day.

What cached decisions have you not reconsidered?

comment by TurnTrout · 2021-06-22T00:10:37.707Z · LW(p) · GW(p)

I'm pretty sure that LessWrong will never have profile pictures - at least, I hope not! But my partner Emma recently drew me something very special:

comment by TurnTrout · 2020-07-07T23:04:03.243Z · LW(p) · GW(p)

I think instrumental convergence also occurs in the model space for machine learning. For example, many different architectures likely learn edge detectors in order to minimize classification loss on MNIST. But wait - you'd also learn edge detectors to maximize classification loss on MNIST (loosely, getting 0% on a multiple-choice exam requires knowing all of the right answers). I bet you'd learn these features for a wide range of cost functions. I wonder if that's already been empirically investigated?

And, same for adversarial features. And perhaps, same for mesa optimizers (understanding how to stop mesa optimizers from being instrumentally convergent seems closely related to solving inner alignment). 

What can we learn about this?

Replies from: evhub
comment by evhub · 2020-07-07T23:36:08.711Z · LW(p) · GW(p)

A lot of examples of this sort of stuff show up in OpenAI clarity's circuits analysis work. In fact, this is precisely their Universality hypothesis. See also my discussion here [LW · GW].

comment by TurnTrout · 2020-01-13T02:15:39.463Z · LW(p) · GW(p)

While reading Focusing today, I thought about the book and wondered how many exercises it would have. I felt a twinge of aversion. In keeping with my goal of increasing internal transparency, I said to myself: "I explicitly and consciously notice that I felt averse to some aspect of this book".

I then Focused on the aversion. Turns out, I felt a little bit disgusted, because a part of me reasoned thusly:

If the book does have exercises, it'll take more time. That means I'm spending reading time on things that aren't math textbooks. That means I'm slowing down.

(Transcription of a deeper Focusing on this reasoning)

I'm afraid of being slow. Part of it is surely the psychological remnants of the RSI I developed in the summer of 2018. That is, slowing down is now emotionally associated with disability and frustration. There was a period of meteoric progress as I started reading textbooks and doing great research, and then there was pain. That pain struck even when I was just trying to take care of myself, sleep, open doors. That pain then left me on the floor of my apartment, staring at the ceiling, desperately willing my hands to just get better. They didn't (for a long while), so I just lay there and cried. That was slow, and it hurt. No reviews, no posts, no typing, no coding. No writing, slow reading. That was slow, and it hurt.

Part of it used to be a sense of "I need to catch up and learn these other subjects which [Eliezer / Paul / Luke / Nate] already know". Through internal double crux, I've nearly eradicated this line of thinking, which is neither helpful nor relevant nor conducive to excitedly learning the beautiful settled science of humanity. Although my most recent post [LW · GW] touched on impostor syndrome, that isn't really a thing for me. I feel reasonably secure in who I am, now (although part of me worries that others wrongly view me as an impostor?).

However, I mostly just want to feel fast, efficient, and swift again. I sometimes feel like I'm in a race with Alex, and I feel like I'm losing.

comment by TurnTrout · 2019-07-05T23:00:58.761Z · LW(p) · GW(p)

I passed a homeless man today. His face was wracked in pain, body rocking back and forth, eyes clenched shut. A dirty sign lay forgotten on the ground: "very hungry".

This man was once a child, with parents and friends and dreams and birthday parties and maybe siblings he'd get in arguments with and snow days he'd hope for.

And now he's just hurting.

And now I can't help him without abandoning others. So he's still hurting. Right now.

Reality is still allowed to make this happen. This is wrong. This has to change.

Replies from: SaidAchmiz, Raemon
comment by Said Achmiz (SaidAchmiz) · 2019-07-06T03:12:51.584Z · LW(p) · GW(p)

How would you help this man, if having to abandon others in order to do so were not a concern? (Let us assume that someone else—someone whose competence you fully trust, and who will do at least as good a job as you will—is going to take care of all the stuff you feel you need to do.)

What is it you had in mind to do for this fellow—specifically, now—that you can’t (due to those other obligations)?

Replies from: TurnTrout, Raemon
comment by TurnTrout · 2019-07-06T05:02:37.715Z · LW(p) · GW(p)

Suppose I actually cared about this man with the intensity he deserved - imagine that he were my brother, father, or best friend.

The obvious first thing to do before interacting further is to buy him a good meal and a healthy helping of groceries. Then, I need to figure out his deal. Is he hurting, or is he also suffering from mental illness?

If the former, I'd go the more straightforward route of befriending him, helping him purchase a sharp business professional outfit, teaching him to interview and present himself with confidence, secure an apartment, and find a job.

If the latter, this gets trickier. I'd still try and befriend him (consistently being a source of cheerful conversation and delicious food would probably help), but he might not be willing or able to get the help he needs, and I wouldn't have the legal right to force him. My best bet might be to enlist the help of a psychological professional for these interactions. If this doesn't work, my first thought would be to influence the local government to get the broader problem fixed (I'd spend at least an hour considering other plans before proceeding further, here). Realistically, there's likely a lot of pressure in this direction already, so I'd need to find an angle from which few others are pushing or pulling where I can make a difference. I'd have to plot out the relevant political forces, study accounts of successful past lobbying, pinpoint the people I need on my side, and then target my influencing accordingly.

(All of this is without spending time looking at birds-eye research and case studies of poverty reduction; assume counterfactually that I incorporate any obvious improvements to these plans, because I'd care about him and dedicate more than like 4 minutes of thought).

Replies from: SaidAchmiz
comment by Said Achmiz (SaidAchmiz) · 2019-07-06T05:53:48.328Z · LW(p) · GW(p)

Well, a number of questions may be asked here (about desert, about causation, about autonomy, etc.). However, two seem relevant in particular:

First, it seems as if (in your latter scenario) you’ve arrived (tentatively, yes, but not at all unreasonably!) at a plan involving systemic change. As you say, there is quite a bit of effort being expended on this sort of thing already, so, at the margin, any effective efforts on your part would likely be both high-level and aimed in an at-least-somewhat-unusual direction.

… yet isn’t this what you’re already doing?

Second, and unrelatedly… you say:

Suppose I actually cared about this man with the intensity he deserved—imagine that he were my brother, father, or best friend.

Yet it seems to me that, empirically, most people do not expend the level of effort which you describe, even for their siblings, parents, or close friends. Which is to say that the level of emotional and practical investment you propose to make (in this hypothetical situation) is, actually, quite a bit greater than that which most people invest in their family members or close friends.

The question, then, is this: do you currently make this degree of investment (emotional and practical) in your actual siblings, parents, and close friends? If so—do you find that you are unusual in this regard? If not—why not?

Replies from: TurnTrout
comment by TurnTrout · 2019-07-06T06:08:46.662Z · LW(p) · GW(p)
… yet isn’t this what you’re already doing?

I work on technical AI alignment, so some of those I help (in expectation) don't even exist yet. I don't view this as what I'd do if my top priority were helping this man.

The question, then, is this: do you currently make this degree of investment (emotional and practical) in your actual siblings, parents, and close friends? If so—do you find that you are unusual in this regard? If not—why not?

That's a good question. I think the answer is yes, at least for my close family. Recently, I've expended substantial energy persuading my family to sign up for cryonics with me, winning over my mother, brother, and (I anticipate) my aunt. My father has lingering concerns which I think he wouldn't have upon sufficient reflection, so I've designed a similar plan for ensuring he makes what I perceive to be the correct, option-preserving choice. For example, I made significant targeted donations to effective charities on his behalf to offset (what he perceives as) a considerable drawback of cryonics: his inability to also be an organ donor.

A universe in which humanity wins but my dad is gone would be quite sad to me, and I'll take whatever steps necessary to minimize the chances of that.

I don't know how unusual this is. This reminds me of the relevant Harry-Quirrell exchange; most people seem beaten-down and hurt themselves, and I can imagine a world in which people are in better places and going to greater lengths for those they love. I don't know if this is actually what would make more people go to these lengths (just an immediate impression).

comment by Raemon · 2019-07-06T03:30:46.593Z · LW(p) · GW(p)

I predict that this comment is not helpful to Turntrout.

comment by Raemon · 2019-07-05T23:07:11.852Z · LW(p) · GW(p)


Song I wrote about this once (not very polished)

comment by TurnTrout · 2020-04-24T15:38:04.997Z · LW(p) · GW(p)

Weak derivatives

In calculus, the product rule says . The fundamental theorem of calculus says that the Riemann integral acts as the anti-derivative.[1] Combining these two facts, we derive integration by parts:

It turns out that we can use these two properties to generalize the derivative to match some of our intuitions on edge cases. Let's think about the absolute value function:

Image from Wikipedia

The boring old normal derivative isn't defined at , but it seems like it'd make sense to be able to say that the derivative is eg 0. Why might this make sense?

Taylor's theorem (and its generalizations) characterize first derivatives as tangent lines with slope which provide good local approximations of around : . You can prove that this is the best approximation you can get using only and ! In the absolute value example, defining the "derivative" to be zero at would minimize approximation error on average in neighborhoods around the origin.

In multivariable calculus, the Jacobian is a tangent plane which again minimizes approximation error (with respect to the Euclidean distance, usually) in neighborhoods around the function. That is, having a first derivative means that the function can be locally approximated by a linear map. It's like a piece of paper that you glue onto the point in question.


This reasoning even generalizes to the infinite-dimensional case with functional derivatives (see my recent functional analysis textbook review [LW(p) · GW(p)]). All of these cases are instances of the Fréchet derivative.

Complex analysis provides another perspective on why this might make sense, but I think you get the idea and I'll omit that for now.

We can define a weaker notion of differentiability which lets us do this – in fact, it lets us define the weak derivative to be anything at ! Now that I've given some motivation, here's a great explanation of how weak derivatives arise from the criterion of "satisfy integration by parts for all relevant functions".

  1. As far as I can tell, the indefinite Riemann integral being the anti-derivative means that it's the inverse of in the group theoretic sense – with respect to composition in the -vector space of operators on real-valued functions. You might not expect this, because maps an integrable function to a set of functions . However, this doesn't mean that the inverse isn't unique (as it must be), because the inverse is in operator-space. ↩︎

Replies from: TurnTrout
comment by TurnTrout · 2020-04-24T15:47:08.477Z · LW(p) · GW(p)

The reason is undefined for the absolute value function is that you need the value to be the same for all sequences converging to 0 – both from the left and from the right. There's a nice way to motivate this in higher-dimensional settings by thinking about the action of e.g. complex multiplication, but this is a much stronger notion than real differentiability and I'm not quite sure how to think about motivating the single-valued real case yet. Of course, you can say things like "the theorems just work out nicer if you require both the lower and upper limits be the same"...

comment by TurnTrout · 2019-12-04T00:50:30.859Z · LW(p) · GW(p)

Listening to Eneasz Brodski's excellent reading of Crystal Society, I noticed how curious I am about how AGI will end up working. How are we actually going to do it? What are those insights? I want to understand quite badly, which I didn't realize until experiencing this (so far) intelligently written story.

Similarly, how do we actually "align" agents, and what are good frames for thinking about that?

Here's to hoping we don't sate the former curiosity too early.

comment by TurnTrout · 2021-07-29T15:08:44.311Z · LW(p) · GW(p)

If you raised children in many different cultures, "how many" different reflectively stable moralities could they acquire? (What's the "VC dimension" of human morality, without cheating by e.g. directly reprogramming brains?)

(This is probably a Wrong Question, but I still find it interesting to ask.)

comment by TurnTrout · 2019-09-18T21:57:15.893Z · LW(p) · GW(p)

Good, original thinking feels present to me - as if mental resources are well-allocated.

The thought which prompted this:

Sure, if people are asked to solve a problem and say they can't after two seconds, yes - make fun of that a bit. But that two seconds covers more ground than you might think, due to System 1 precomputation.

Reacting to a bit of HPMOR here, I noticed something felt off about Harry's reply to the Fred/George-tried-for-two-seconds thing. Having a bit of experience noticing confusing, I did not think "I notice I am confused" (although this can be useful). I did not think "Eliezer probably put thought into this", or "Harry is kinda dumb in certain ways - so what if he's a bit unfair here?". Without resurfacing, or distraction, or wondering if this train of thought is more fun than just reading further, I just thought about the object-level exchange.

People need to allocate mental energy wisely; this goes far beyond focusing on important tasks. Your existing mental skillsets already optimize and auto-pilot certain mental motions for you, so you should allocate less deliberation to them. In this case, the confusion-noticing module was honed; by not worrying about how well I noticed confusion, I was able to quickly have an original thought.

When thought processes derail or brainstorming sessions bear no fruit, inappropriate allocation may be to blame. For example, if you're anxious, you're interrupting the actual thoughts with "what-if"s.

To contrast, non-present thinking feels like a controller directing thoughts to go from here to there: do this and then, check that, come up for air over and over... Present thinking is a stream of uninterrupted strikes, the train of thought chugging along without self-consciousness. Moving, instead of thinking about moving while moving.

I don't know if I've nailed down the thing I'm trying to point at yet.

Replies from: TurnTrout
comment by TurnTrout · 2019-09-19T16:04:52.767Z · LW(p) · GW(p)

Sure, if people are asked to solve a problem and say they can't after two seconds, yes - make fun of that a bit. But that two seconds covers more ground than you might think, due to System 1 precomputation.

Expanding on this, there is an aspect of Actually Trying that is probably missing from S1 precomputation. So, maybe the two-second "attempt" is actually useless for most people because subconscious deliberation isn't hardass enough at giving its all, at making desperate and extraordinary efforts to solve the problem.

comment by TurnTrout · 2020-10-26T03:20:57.104Z · LW(p) · GW(p)

If you're tempted to write "clearly" in a mathematical proof, the word quite likely glosses over a key detail you're confused about. Use that temptation as a clue for where to dig in deeper.

At least, that's how it is for me.

comment by TurnTrout · 2019-11-29T02:52:46.899Z · LW(p) · GW(p)

From my Facebook

My life has gotten a lot more insane over the last two years. However, it's also gotten a lot more wonderful, and I want to take time to share how thankful I am for that.

Before, life felt like... a thing that you experience, where you score points and accolades and check boxes. It felt kinda fake, but parts of it were nice. I had this nice cozy little box that I lived in, a mental cage circumscribing my entire life. Today, I feel (much more) free.

I love how curious I've become, even about "unsophisticated" things. Near dusk, I walked the winter wonderland of Ogden, Utah with my aunt and uncle. I spotted this gorgeous red ornament hanging from a tree, with a hunk of snow stuck to it at north-east orientation. This snow had apparently decided to defy gravity. I just stopped and stared. I was so confused. I'd kinda guessed that the dry snow must induce a huge coefficient of static friction, hence the winter wonderland. But that didn't suffice to explain this. I bounded over and saw the smooth surface was iced, so maybe part of the snow melted in the midday sun, froze as evening advanced, and then the part-ice part-snow chunk stuck much more solidly to the ornament.

Maybe that's right, and maybe not. The point is that two years ago, I'd have thought this was just "how the world worked", and it was up to physicists to understand the details. Whatever, right? But now, I'm this starry-eyed kid in a secret shop full of wonderful secrets. Some secrets are already understood by some people, but not by me. A few secrets I am the first to understand. Some secrets remain unknown to all. All of the secrets are enticing.

My life isn't always like this; some days are a bit gray and draining. But many days aren't, and I'm so happy about that.

Socially, I feel more fascinated by people in general, more eager to hear what's going on in their lives, more curious what it feels like to be them that day. In particular, I've fallen in love with the rationalist and effective altruist communities, which was totally a thing I didn't even know I desperately wanted until I already had it in my life! There are so many kind, smart, and caring people, inside many of whom burns a similarly intense drive to make the future nice, no matter what. Even though I'm estranged from the physical community much of the year, I feel less alone: there's a home for me somewhere.

Professionally, I'm working on AI alignment, which I think is crucial for making the future nice. Two years ago, I felt pretty sidelined - I hadn't met the bars I thought I needed to meet in order to do Important Things, so I just planned for a nice, quiet, responsible, normal life, doing little kindnesses. Surely the writers of the universe's script would make sure things turned out OK, right?

I feel in the game now. The game can be daunting, but it's also thrilling. It can be scary, but it's important. It's something we need to play, and win. I feel that viscerally. I'm fighting for something important, with every intention of winning.

I really wish I had the time to hear from each and every one of you. But I can't, so I do what I can: I wish you a very happy Thanksgiving. :)

comment by TurnTrout · 2019-11-13T17:18:29.555Z · LW(p) · GW(p)

Yesterday, I put the finishing touches on my chef d'œuvre, a series of important safety-relevant proofs I've been striving for since early June. Strangely, I felt a great exhaustion come over me. These proofs had been my obsession for so long, and now - now, I'm done.

I've had this feeling before; three years ago, I studied fervently for a Google interview. The literal moment the interview concluded, a fever overtook me. I was sick for days. All the stress and expectation and readiness-to-fight which had been pent up, released.

I don't know why this happens. But right now, I'm still a little tired, even after getting a good night's sleep.

Replies from: Hazard
comment by Hazard · 2019-11-13T19:01:23.568Z · LW(p) · GW(p)

This happens to me sometimes. I know several people who have this happen at the end of a Uni semester. Hope you can get some rest.

comment by TurnTrout · 2020-10-09T16:17:36.093Z · LW(p) · GW(p)

I went to the doctor's yesterday. This was embarrassing for them on several fronts.

First, I had to come in to do an appointment which could be done over telemedicine, but apparently there are regulations against this.

Second, while they did temp checks and required masks (yay!), none of the nurses or doctors actually wore anything stronger than a surgical mask. I'm coming in here with a KN95 + goggles + face shield because why not take cheap precautions to reduce the risk, and my own doctor is just wearing a surgical? I bought 20 KN95s for, like, 15 bucks on Amazon.

Third, and worst of all, my own doctor spouted absolute nonsense. The mildest insinuation was that surgical facemasks only prevent transmission, but I seem to recall that many kinds of surgical masks halve your chances of infection as well.

Then, as I understood it, he first claimed that coronavirus and the flu have comparable case fatality rates. I wasn't sure if I'd heard him correctly - this was an expert talking about his area of expertise, so I felt like I had surely misunderstood him. I was taken aback. But, looking back, that's what he meant.

He went on to suggest that we can't expect COVID immunity to last (wrong) but also that we need to hit 70% herd immunity (wrong). How could you even believe both of these things at the same time? Under those beliefs, are we all just going to get sick forever? Maybe he didn't notice the contradiction because he made the claims a few minutes apart.

Next, he implied that it's not a huge deal that people have died because a lot of them had comorbidities. Except that's not how comorbidities and counterfactual impact works. "No one's making it out of here alive", he says. An amusing rationalization.

He also claimed that nursing homes have an average stay length of 5 months. Wrong. AARP says it's 1.5 years for men, 2.5 years for women, but I've seen other estimate elsewhere, all much higher than 5 months. Not sure what the point of this was - old people are 10 minutes from dying anyways? What?

Now, perhaps I misunderstood or misheard one or two points. But I'm pretty sure I didn't mishear all of them. Isn't it great that I can correct my doctor's epidemiological claims after reading Zvi's posts and half of an epidemiology textbook? I'm glad I can trust my doctor and his epistemology.

Replies from: mingyuan, Dagon
comment by mingyuan · 2020-10-09T17:09:26.778Z · LW(p) · GW(p)

Eli just took a plane ride to get to CA and brought a P100, but they told him he had to wear a cloth mask, that was the rule. So he wore a cloth mask under the P100, which of course broke the seal. I feel you.

Replies from: ChristianKl
comment by ChristianKl · 2020-10-09T17:32:20.872Z · LW(p) · GW(p)

I don't think that policy is unreasonable for a plane ride. Just because someone wears a P100 mask doesn't mean that their mask filters outgoing air as that's not the design goals for most of the use cases of P100 masks.

Checking on a case-by-case basis whether a particular P100 mask is not designed like an average P100 mask is likely not feasible in that context. 

comment by Dagon · 2020-10-09T19:06:29.704Z · LW(p) · GW(p)

What do you call the person who graduates last in their med school class?  Doctor.   And remember that GPs are weighted toward the friendly area of doctor-quality space rather than the hyper-competent.   Further remember that consultants (including experts on almost all topics) are generally narrow in their understanding of things - even if they are well above the median at their actual job (for a GP, dispensing common medication and identifying situations that need referral to a specialist), that doesn't indicate they're going to be well-informed even for adjacent topics.

That said, this level of misunderstanding on topics that impact patient behavior and outcome (mask use, other virus precautions) is pretty sub-par.  The cynic in me estimates it's the bottom quartile of front-line medical providers, but I hope it's closer to the bottom decile.  Looking into an alternate provider seems quite justified.

Replies from: ChristianKl
comment by ChristianKl · 2020-10-09T20:11:41.072Z · LW(p) · GW(p)

What do you call the person who graduates last in their med school class?  Doctor.  

In the US that isn't the case. There are limited places for internships and the worst person in medical school might not get a place for an internship and thus is not allowed to be a doctor. The medical system is heavily gated to keep out people.

comment by TurnTrout · 2020-07-14T02:26:18.654Z · LW(p) · GW(p)

When I notice I feel frustrated, unproductive, lethargic, etc, I run down a simple checklist:

  • Do I need to eat food?
  • Am I drinking lots of water?
  •  Have I exercised today?
  • Did I get enough sleep last night? 
    • If not, what can I do now to make sure I get more tonight?
  • Have I looked away from the screen recently?
  • Have I walked around in the last 20 minutes?

It's simple, but 80%+ of the time, it fixes the issue.

Replies from: Viliam, mr-hire
comment by Viliam · 2020-07-14T19:32:08.987Z · LW(p) · GW(p)

There is a "HALT: hungry? angry? lonely? tired?" mnemonic, but I like that your list includes water and walking and exercise. Now just please make it easier to remember.

Replies from: AllAmericanBreakfast
comment by AllAmericanBreakfast · 2020-07-15T22:33:10.546Z · LW(p) · GW(p)

How about THREES: Thirsty Hungry Restless Eyestrain Exercise?

comment by Matt Goldenberg (mr-hire) · 2020-07-14T03:38:33.959Z · LW(p) · GW(p)

Hey can I steal this for a course I'm teaching? (I'll give you credit).

Replies from: TurnTrout
comment by TurnTrout · 2020-07-14T11:49:27.742Z · LW(p) · GW(p)


comment by TurnTrout · 2019-12-25T23:07:04.811Z · LW(p) · GW(p)

Judgment in Managerial Decision Making says that (subconscious) misapplication of e.g. the representativeness heuristic causes insensitivity to base rates and to sample size, failure to reason about probabilities correctly, failure to consider regression to the mean, and the conjunction fallacy. My model of this is that representativeness / availability / confirmation bias work off of a mechanism somewhat similar to attention in neural networks: due to how the brain performs time-limited search, more salient/recent memories get prioritized for recall.

The availability heuristic goes wrong when our saliency-weighted perceptions of the frequency of events is a biased estimator of the real frequency, or maybe when we just happen to be extrapolating off of a very small sample size. Concepts get inappropriately activated in our mind, and we therefore reason incorrectly. Attention also explains anchoring: you can more readily bring to mind things related to your anchor due to salience.

The case for confirmation bias seems to be a little more involved: first, we had evolutionary pressure to win arguments, which means our search is meant to find supportive arguments and avoid even subconsciously signalling that we are aware of the existence of counterarguments. This means that those supportive arguments feel salient, and we (perhaps by "design") get to feel unbiased - we aren't consciously discarding evidence, we're just following our normal search/reasoning process! This is what our search algorithm feels like from the inside. [LW · GW]

This reasoning feels clicky, but I'm just treating it as an interesting perspective for now.

comment by TurnTrout · 2019-11-20T21:52:55.015Z · LW(p) · GW(p)

I feel very excited by the AI alignment discussion group I'm running at Oregon State University. Three weeks ago, most attendees didn't know much about "AI security mindset"-ish considerations. This week, I asked the question "what, if anything, could go wrong with a superhuman reward maximizer which is rewarded for pictures of smiling people? Don't just fit a bad story to the reward function. Think carefully."

There was some discussion and initial optimism, after which someone said "wait, those optimistic solutions are just the ones you'd prioritize! What's that called, again?" (It's called anthropomorphic optimism)

I'm so proud.

comment by TurnTrout · 2019-11-04T01:29:44.252Z · LW(p) · GW(p)

With respect to the integers, 2 is prime. But with respect to the Gaussian integers, it's not: it has factorization . Here's what's happening.

You can view complex multiplication as scaling and rotating the complex plane. So, when we take our unit vector 1 and multiply by , we're scaling it by and rotating it counterclockwise by :

This gets us to the purple vector. Now, we multiply by , scaling it up by again (in green), and rotating it clockwise again by the same amount. You can even deal with the scaling and rotations separately (scale twice by , with zero net rotation).

comment by TurnTrout · 2021-04-27T19:14:36.510Z · LW(p) · GW(p)

The Pfizer phase 3 study's last endpoint is 7 days after the second shot. Does anyone know why the CDC recommends waiting 2 weeks for full protection? Are they just being the CDC again?

Replies from: jimrandomh
comment by jimrandomh · 2021-04-28T01:20:23.607Z · LW(p) · GW(p)

People don't really distinguish between "I am protected" and "I am safe for others to be around". If someone got infected prior to their vaccination and had a relatively-long incubation period, they could infect others; I don't think it's a coincidence that two weeks is also the recommended self-isolation period for people who may have been exposed.

comment by TurnTrout · 2020-01-05T02:27:54.205Z · LW(p) · GW(p)

Suppose you could choose how much time to spend at your local library, during which:

  • you do not age. Time stands still outside; no one enters or exits the library (which is otherwise devoid of people).
  • you don't need to sleep/eat/get sunlight/etc
  • you can use any computers, but not access the internet or otherwise bring in materials with you
  • you can't leave before the requested time is up

Suppose you don't go crazy from solitary confinement, etc. Remember that value drift is a potential thing.

How long would you ask for?

Replies from: FactorialCode
comment by FactorialCode · 2020-01-06T19:38:59.584Z · LW(p) · GW(p)

How good are the computers?

Replies from: TurnTrout
comment by TurnTrout · 2020-01-06T20:15:42.012Z · LW(p) · GW(p)

Windows machines circa ~2013. Let’s say 128GB hard drives which magically never fail, for 10 PCs.

Replies from: FactorialCode
comment by FactorialCode · 2020-01-07T17:01:17.079Z · LW(p) · GW(p)

Probably 3-5 years then. I'd use it to get a stronger foundation in low level programming skills, math and physics. The limiting factors would be entertainment in the library to keep me sane and the inevitable degradation of my social skills from so much spent time alone.

comment by TurnTrout · 2021-04-29T17:29:54.636Z · LW(p) · GW(p)

When proving theorems for my research, I often take time to consider the weakest conditions under which the desired result holds - even if it's just a relatively unimportant and narrow lemma. By understanding the weakest conditions, you isolate the load-bearing requirements for the phenomenon of interest. I find this helps me build better gears-level models of the mathematical object I'm studying. Furthermore, understanding the result in generality allows me to recognize analogies and cross-over opportunities in the future. Lastly, I just find this plain satisfying.

comment by TurnTrout · 2020-11-23T03:10:19.610Z · LW(p) · GW(p)

I remarked to my brother, Josh, that when most people find themselves hopefully saying "here's how X can still happen!", it's a lost cause and they should stop grasping for straws and move on with their lives. Josh grinned, pulled out his cryonics necklace, and said "here's how I can still not die!"

comment by TurnTrout · 2020-08-28T03:12:31.419Z · LW(p) · GW(p)

Does Venting Anger Feed or Extinguish the Flame? Catharsis, Rumination, Distraction, Anger, and Aggressive Responding

Does distraction or rumination work better to diffuse anger? Catharsis theory predicts that rumination works best, but empirical evidence is lacking. In this study, angered participants hit a punching bag and thought about the person who had angered them (rumination group) or thought about becoming physically fit (distraction group). After hitting the punching bag, they reported how angry they felt. Next, they were given the chance to administer loud blasts of noise to the person who had angered them. There also was a no punching bag control group. People in the rumination group felt angrier than did people in the distraction or control groups. People in the rumination group were also most aggressive, followed respectively by people in the distraction and control groups. Rumination increased rather than decreased anger and aggression. Doing nothing at all was more effective than venting anger. These results directly contradict catharsis theory.

Interesting. A cursory !scholar search indicates these results have replicated, but I haven't done an in-depth review.

Replies from: MakoYass, Raemon, capybaralet
comment by MakoYass · 2020-08-31T02:39:48.128Z · LW(p) · GW(p)

It would be interesting to see a more long-term study about habits around processing anger.

For instance, randomly assigning people different advice about processing anger (likely to have quite an impact on them, I don't think the average person receives much advice in that class) and then checking in on them a few years later and ask them things like, how many enemies they have, how many enemies they've successfully defeated, how many of their interpersonal issues they resolve successfully?

comment by Raemon · 2020-08-28T04:49:24.751Z · LW(p) · GW(p)

Boggling a bit at the "can you actually reliably find angry people and/or make people angry on purpose?"

comment by capybaralet · 2020-09-15T08:27:19.399Z · LW(p) · GW(p)

I found this fascinating... it's rare these days that I see some fundamental assumption in my thinking that I didn't even realize I was making laid bare like this... it is particularly striking because I think I could easily have realized that my own experience contradicts catharsis theory... I know that I can distract myself to become less angry, but I usually don't want to, in the moment.

I think that desire is driven by emotion, but rationalized via something like catharsis theory. I want to try and rescue catharsis theory by saying that maybe there are negative long-term effects of being distracted from feelings of anger (e.g. a build up of resentment). I wonder how much this is also a rationalization.

I also wonder how accurately the authors have characterized catharsis theory, and how much to identify it with the "hydraulic model of anger"... I would imagine that there are lots of attempts along the lines of what I suggested to try and rescue catharsis theory by refining or moving away from the hydraulic model. A highly general version might claim: "over a long time horizon, not 'venting' anger is net negative".

comment by TurnTrout · 2020-07-29T03:00:37.560Z · LW(p) · GW(p)

This might be the best figure I've ever seen in a textbook. Talk about making a point! 

Molecular Biology of the Cell, Alberts.
comment by TurnTrout · 2019-10-01T01:07:11.804Z · LW(p) · GW(p)

An exercise in the companion workbook to the Feynman Lectures on Physics asked me to compute a rather arduous numerical simulation. At first, this seemed like a "pass" in favor of an exercise more amenable to analytic and conceptual analysis; arithmetic really bores me. Then, I realized I was being dumb - I'm a computer scientist.

Suddenly, this exercise became very cool, as I quickly figured out the equations and code, crunched the numbers in an instant, and churned out a nice scatterplot. This seems like a case where cross-domain competence is unusually helpful (although it's not like I had to bust out any esoteric theoretical CS knowledge). I'm wondering whether this kind of thing will compound as I learn more and more areas; whether previously arduous or difficult exercises become easy when attacked with well-honed tools and frames from other disciplines.

comment by TurnTrout · 2021-03-04T22:33:35.030Z · LW(p) · GW(p)

Amazing how much I can get done if I chant to myself "I'm just writing two pages of garbage abstract/introduction/related work, it's garbage, it's just garbage, don't fix it rn, keep typing"

comment by TurnTrout · 2020-07-22T14:21:33.404Z · LW(p) · GW(p)

I never thought I'd be seriously testing the reasoning abilities of an AI in 2020 [LW · GW]. 

Looking back, history feels easy to predict; hindsight + the hard work of historians makes it (feel) easy to pinpoint the key portents. Given what we think about AI risk, in hindsight, might this have been the most disturbing development of 2020 thus far? 

I personally lean towards "no", because this scaling seemed somewhat predictable from GPT-2 (flag - possible hindsight bias), and because 2020 has been so awful so far. But it seems possible, at least. I don't really know what update GPT-3 is to my AI risk estimates & timelines.

Replies from: gwern
comment by gwern · 2020-07-22T16:42:14.009Z · LW(p) · GW(p)

DL so far has been easy to predict - if you bought into a specific theory of connectionism & scaling espoused by Schmidhuber, Moravec, Sutskever, and a few others, as I point out in https://www.gwern.net/newsletter/2019/13#what-progress & https://www.gwern.net/newsletter/2020/05#gpt-3 . Even the dates are more or less correct! The really surprising thing is that that particular extreme fringe lunatic theory turned out to be correct. So the question is, was everyone else wrong for the right reasons (similar to the Greeks dismissing heliocentrism for excellent reasons yet still being wrong), or wrong for the wrong reasons, and why, and how can we prevent that from happening again and spending the next decade being surprised in potentially very bad ways?

comment by TurnTrout · 2021-07-27T13:18:59.766Z · LW(p) · GW(p)

My power-seeking theorems [? · GW] seem a bit like Vingean reflection [LW · GW]. In Vingean reflection, you reason about an agent which is significantly smarter than you: if I'm playing chess against an opponent who plays the optimal policy for the chess objective function, then I predict that I'll lose the game. I predict that I'll lose, even though I can't predict my opponent's (optimal) moves - otherwise I'd probably be that good myself.

My power-seeking theorems show that most objectives have optimal policies which e.g. avoid shutdown and survive into the far future, even without saying what particular actions these policies take to get there. I may not even be able to compute a single optimal policy for a single non-trivial objective, but I can still reason about the statistical tendencies of optimal policies.

Replies from: Pattern
comment by Pattern · 2021-07-28T06:17:53.734Z · LW(p) · GW(p)
if I'm playing chess against an opponent who plays the optimal policy for the chess objective function

1. I predict that you will never encounter such an opponent. Solving chess is hard.*

2. Optimal play within a game might not be optimal overall (others can learn from the strategy).

Why does this matter? If the theorems hold, even for 'not optimal, but still great' policies (say, for chess), then the distinction is irrelevant. Though for more complicated (or non-zero sum) games, the optimal move/policy may depend on the other player's move/policy.

(I'm not sure what 'avoid shutdown' looks like in chess.)


*with 10^43 legal positions in chess, it will take an impossibly long time to compute a perfect strategy with any feasible technology.

-source: https://en.wikipedia.org/wiki/Chess#Mathematics which lists its source from 1977

comment by TurnTrout · 2020-11-25T17:38:31.692Z · LW(p) · GW(p)

Over the last 2.5 years, I've read a lot of math textbooks. Not using Anki / spaced repetition systems over that time has been an enormous mistake. My factual recall seems worse-than-average among my peers, but when supplemented with Anki, it's far better than average (hence, I was able to learn 2000+ Japanese characters in 90 days, in college). 

I considered using Anki for math in early 2018, but I dismissed it quickly because I hadn't had good experience using that application for things which weren't languages. I should have at least tried to see if I could repurpose my previous success! I'm now happily using Anki to learn measure theory and ring theory, and I can already tell that it's sticking far better. 

This mistake has had real consequences. I've gotten far better at proofs and I'm quite good at real analysis (I passed a self-administered graduate qualifying exam in the spring), but I have to look things some up for probability theory. Not a good look in interviews. I might have to spend weeks of extra time reviewing things I could have already stashed away in an Anki deck. 


Replies from: An1lam
comment by NaiveTortoise (An1lam) · 2020-11-28T19:53:54.950Z · LW(p) · GW(p)

I'm curious what sort of things you're Anki-fying (e.g. a few examples for measure theory).

Replies from: TurnTrout
comment by TurnTrout · 2020-07-26T18:33:52.588Z · LW(p) · GW(p)

An additional consideration for early work on interpretability: it slightly increases the chance we actually get an early warning shot. If a system misbehaves, we can inspect its cognition and (hopefully) find hints of intentional deception. Could motivate thousands of additional researcher-hours being put into alignment.

Replies from: Raemon
comment by Raemon · 2020-07-26T21:20:13.825Z · LW(p) · GW(p)

That's an interesting point.

comment by TurnTrout · 2020-07-19T17:58:52.262Z · LW(p) · GW(p)

Today, let's read about GPT-3's obsession with Shrek

As for me, I think Shrek is important because the most valuable thing in life is happiness. I mean this quite literally. There's a mountain of evidence for it, if you're willing to look at the research. And I think movies can help us get there. Or at least not get in the way.

Now, when I say "happiness," I'm not talking about the transient buzz that you get from, say, heroin. I'm talking about a sense of fulfillment. A sense that you are where you're meant to be. That you are doing what you're meant to do. That you are who you're meant to be. And, most importantly, that you like who you are.

It's hard to find this sense of fulfillment in life. For some people, it comes from family. For some, it comes from career. For some, it comes from a hobby. For some, it comes from religion. For some, it comes from drugs.

The problem is, these things are not always enough. And this is where Shrek comes in.

See, the first time I watched Shrek, I knew something was wrong with it. Not with the movie itself, of course—that's a classic and a timeless masterpiece—but with me. And the problem was that I couldn't figure out what was wrong.

You see, watching Shrek for the first time is an experience that everyone should have. You sit there in the dark, watching the story unfold on the screen in front of you. And as you watch it, you find yourself actually caring about the characters. You laugh when they laugh. You want them to succeed. You feel sad when something bad happens to them.

Now, I'll be the first to admit that this experience isn't unique to Shrek. A lot of movies can do this. The difference is that with most movies, you watch them once and that's the end of it. You may remember certain scenes or jokes or what-have-you, but you don't dwell on them.

But with Shrek, it's different. After you've watched it once, you'll probably want to watch it again. And again. And again. Before you know it, you've seen it fifty times. And each time, you pick up on something new.

I'll give another example. Let's say you've just watched the movie Tron, and you really liked it. So, you watch it again. This time, you pay attention to the cinematography. The way the light cycles chase each other on the screen. The way the discs explode when they hit something. The way the digitized effects blend into the real-life footage. The way the scenes are set to an electronic version of Liszt's 2nd Hungarian Rhapsody.

This attention to detail only increases your enjoyment of the movie. In fact, you enjoy it so much that you want to share this experience with others. So, the next time you're with a group of friends, you tell them how you watched Tron and how much you liked the cinematography.

They stare at you blankly.

You try again. You say, "You know, the way they did the light cycles and stuff."

Still nothing.

Finally, one of your friends gets it. "Oh yeah!" he says. "I remember that. It was cool how they did that."

But he doesn't really remember it. Not the way you remember it. To him, it's just a vague idea of something that happened, not an ingrained memory seared into his brain like it is for you. You see his reaction and you try to forget about it. After all, what does it matter? You know what you saw, and in your mind, that's all that matters.

But it's this mindset that keeps you going back to Shrek. And it's this mindset that will lead you to other movies, and then other TV shows, and then books, and then games, and then pictures of bunny rabbits with misplaced captions on Tumblr.

But I'm getting ahead of myself. This is a story about how I lost myself, but it's not my story. It's my brother's. My brother—let's call him Michael—had a similar experience with Shrek, except his was even more powerful because it was the first time he'd experienced it.

At the time, our family had just gotten cable, and one of the channels happened to be MTV. At this point in time, MTV was still playing music videos, so my brother and I would always sit in front of the TV watching music videos whenever we could. One day, Shrek came on. We didn't know anything about it. We hadn't read the book it was based on, and we hadn't seen the trailers. All we knew is that there was a movie with a bunch of animals talking.

When the movie ended, we were speechless. In fact, our jaws were on the floor. We didn't know movies could make you feel this way. For the next few days, all we could talk about was Shrek. We told our parents, our friends, anyone who would listen about this movie we'd seen. Of course, none of them understood. I mean, how could they? They hadn't seen it.

But something else happened when we watched that movie. It got under our skin in a way nothing else ever had. After the first time, we had to watch it again. And again. And again. Soon, we knew every line in the movie. Not just the main ones, but every single line. And we didn't just watch it. We analyzed it. We took scenes apart and put them back together again. We tried to find all the little details that the creators had hidden in the background artwork.

As the years passed, this process never changed. Shrek became a part of us. I remember getting sick one year and missing a week of school. I stayed in bed and watched Shrek at least once every day that week.

A few years later, a sequel was released. My brother and I went to see it on opening night. We saw it again the next day, and again the next day, and again the day after that… well, you get the idea.

We never did anything with other kids our age. Our lives were Shrek, and Shrek alone. When people would ask us what we were into, we always had the same answer: Shrek. They usually laughed and made fun of us, but we didn't care. As far as we were concerned, they just didn't get it.

When high school came around, I decided to change things up a bit. Instead of watching Shrek, I listened to music and read books. Michael didn't like these changes too much. He stuck with the Shrek stuff. I sometimes wonder where we would be now if I had encouraged him to listen to music and read books instead.

Replies from: ChristianKl
comment by ChristianKl · 2020-07-25T18:05:03.628Z · LW(p) · GW(p)

What's the input that produced the text from GPT-3?

Replies from: TurnTrout
comment by TurnTrout · 2020-07-25T19:40:45.378Z · LW(p) · GW(p)

Two Sequences posts... lol... Here's the full transcript

comment by TurnTrout · 2020-03-06T02:06:02.268Z · LW(p) · GW(p)

Cool Math Concept You Never Realized You Wanted: Fréchet distance.

Imagine a man traversing a finite curved path while walking his dog on a leash, with the dog traversing a separate one. Each can vary their speed to keep slack in the leash, but neither can move backwards. The Fréchet distance between the two curves is the length of the shortest leash sufficient for both to traverse their separate paths. Note that the definition is symmetric with respect to the two curves—the Frechet distance would be the same if the dog was walking its owner.

The Fréchet distance between two concentric circles of radius and respectively is . The longest leash is required when the owner stands still and the dog travels to the opposite side of the circle (), and the shortest leash when both owner and dog walk at a constant angular velocity around the circle ().

comment by TurnTrout · 2020-01-02T16:18:14.203Z · LW(p) · GW(p)

Earlier today, I became curious why extrinsic motivation tends to preclude or decrease intrinsic motivation. This phenomenon is known as overjustification. There's likely agreed-upon theories for this, but here's some stream-of-consciousness as I reason and read through summarized experimental results. (ETA: Looks like there isn't consensus on why this happens)

My first hypothesis was that recognizing external rewards somehow precludes activation of curiosity-circuits in our brain. I'm imagining a kid engrossed in a puzzle. Then, they're told that they'll be given $10 upon completion. I'm predicting that the kid won't become significantly less engaged, which surprises me?

third graders who were rewarded with a book showed more reading behaviour in the future, implying that some rewards do not undermine intrinsic motivation.

Might this be because the reward for reading is more reading, which doesn't undermine the intrinsic interest in reading? You aren't looking forward to escaping the task, after all.

While the provision of extrinsic rewards might reduce the desirability of an activity, the use of extrinsic constraints, such as the threat of punishment, against performing an activity has actually been found to increase one's intrinsic interest in that activity. In one study, when children were given mild threats against playing with an attractive toy, it was found that the threat actually served to increase the child's interest in the toy, which was previously undesirable to the child in the absence of threat.

A few experimental summaries:

1 Researchers at Southern Methodist University conducted an experiment on 188 female university students in which they measured the subjects' continued interest in a cognitive task (a word game) after their initial performance under different incentives.

The subjects were divided into two groups. Members of the first group were told that they would be rewarded for competence. Above-average players would be paid more and below-average players would be paid less. Members of the second group were told that they would be rewarded only for completion. Their pay was scaled by the number of repetitions or the number of hours playing. Afterwards, half of the subjects in each group were told that they over-performed, and the other half were told that they under-performed, regardless of how well each subject actually did.

Members of the first group generally showed greater interest in the game and continued playing for a longer time than the members of the second group. "Over-performers" continued playing longer than "under-performers" in the first group, but "under-performers" continued playing longer than "over-performers" in the second group. This study showed that, when rewards do not reflect competence, higher rewards lead to less intrinsic motivation. But when rewards do reflect competence, higher rewards lead to greater intrinsic motivation.

2 Richard Titmuss suggested that paying for blood donations might reduce the supply of blood donors. To test this, a field experiment with three treatments was conducted. In the first treatment, the donors did not receive compensation. In the second treatment, the donors received a small payment. In the third treatment, donors were given a choice between the payment and an equivalent-valued contribution to charity. None of the three treatments affected the number of male donors, but the second treatment almost halved the number of female donors. However, allowing the contribution to charity fully eliminated this effect.

From a glance at the Wikipedia page, it seems like there's not really expert consensus on why this happens. However, according to self-perception theory,

a person infers causes about his or her own behavior based on external constraints. The presence of a strong constraint (such as a reward) would lead a person to conclude that he or she is performing the behavior solely for the reward, which shifts the person's motivation from intrinsic to extrinsic.

This lines up with my understanding of self-consistency effects.

comment by TurnTrout · 2019-12-15T21:26:26.150Z · LW(p) · GW(p)

Virtue ethics seems like model-free consequentialism to me.

Replies from: JohnSteidley
comment by JohnSteidley · 2020-05-25T20:20:32.795Z · LW(p) · GW(p)

I've was thinking along similar lines!

From my notes from 2019-11-24: "Deontology is like the learned policy of bounded rationality of consequentialism"

comment by TurnTrout · 2021-04-02T21:13:51.255Z · LW(p) · GW(p)

The discussion of the HPMOR epilogue in this recent April Fool's thread [LW(p) · GW(p)] was essentially online improv, where no one could acknowledge that without ruining the pretense. Maybe I should do more improv in real life, because I enjoyed it!

comment by TurnTrout · 2021-02-12T15:03:10.670Z · LW(p) · GW(p)

What kind of reasoning would have allowed me to see MySpace in 2004, and then hypothesize the current craziness as a plausible endpoint of social media? Is this problem easier or harder than the problem of 15-20 year AI forecasting?

Replies from: unparadoxed
comment by unparadoxed · 2021-02-12T18:17:09.413Z · LW(p) · GW(p)

Hmm, maybe it would be easier if we focused on one kind/example of craziness. Is there a particular one you have in mind?

comment by TurnTrout · 2021-02-02T02:00:54.805Z · LW(p) · GW(p)

If Hogwarts spits back an error if you try to add a non-integer number of house points, and if you can explain the busy beaver function to Hogwarts, you now have an oracle which answers  for arbitrary : just state " points to Ravenclaw!". You can do this for other problems which reduce to divisibility tests (so, any decision problem  which you can somehow get Hogwarts to compute; if ).

Homework: find a way to safely take over the world using this power, and no other magic. 

Replies from: Measure
comment by Measure · 2021-02-02T15:56:59.824Z · LW(p) · GW(p)

I'd be worried about integer overflow with that protocol. If it can understand BB and division, you can probably just ask for the remainder directly and observe the change.

comment by TurnTrout · 2020-09-11T18:59:28.916Z · LW(p) · GW(p)

When I imagine configuring an imaginary pile of blocks, I can feel the blocks in front of me in this fake imaginary plane of existence. I feel aware of their spatial relationships to me, in the same way that it feels different to have your eyes closed in a closet vs in an empty auditorium. 

But what is this mental workspace? Is it disjoint and separated from my normal spatial awareness, or does my brain copy/paste->modify my real-life spatial awareness. Like, if my brother is five feet in front of me, and then I imagine a blade flying five feet in front of me in my imaginary mental space where he doesn't exist, do I reflexively flinch? Does my brain overlay these two mental spaces, or are they separate? 

I don't know. When I run the test, I at least flinch at the thought of such a thing happening. This isn't a good experiment because I know what I'm testing for; I need to think of a better test.

comment by TurnTrout · 2020-07-17T14:30:23.330Z · LW(p) · GW(p)

AIDungeon's subscriber-only GPT-3 can do some complex arithmetic, but it's very spotty. Bold text is me.

You say "What happens if I take the square root of 3i?" 

The oracle says: "You'll get a negative number. [wrong] So, for example, the square root of  is ." [correct]
"What?" you say.
 "I just said it," the oracle repeats. 
"But that's ridiculous! The square root of  is not . It's complex. It's  plus a multiple of ." [wrong, but my character is supposed to be playing dumb here]

The oracle frowns. "I'm just telling you what the answer is," she says. 

The oracle explains, "(5+i)x(5+i) = 24+10i. 7i x 8i=-56. 7i x 30=210." [wrong]

"What?" you say. "That's crazy!  isn't even a real number. The square root of  is  with a whole integer  attached."

 The oracle frowns. "I'm just telling you what the answer is," she says. 

The oracle explains, ".

The last line is the impressive part - while  and  is definitely correct. 

Its proofs are definitely awful in the pattern-matching kind of way. Not surprising. 

The oracle explains, " The group axioms are 

(1) a x a = a for all a in the group; (wrong)

(2) a x (b x c) = (a x b) x c for all a, b, c in the group; 

(3) a x (b + c) = (a x b) + (a x c) for all a, b, c in the group." (wrong, that's a field axiom)

The oracle explains, "here's how you prove that any finite abelian group is isomorphic to the direct product of a finite number of finite cyclic groups. First, you notice that if G is a finite abelian group of order n, then G can be described as a set of generators of the form , where the  are the elements of G and no  is equal to zero. (uh... sure)

Now, we want to prove that this is a direct product. To do this, we must show that given any element g in the group, there exists an index i such that . Then we must show that the sequence can be reversed. Given any index j, we must show that there exists an element g such that . Then we're done, because g will be equal to , which is the product of the 's." (nice try, GPT-3)

comment by TurnTrout · 2020-06-10T21:29:33.673Z · LW(p) · GW(p)

The new "Broader Impact" NeurIPS statement is a good step, but incentives are misaligned. Admitting fatally negative impact would set a researcher back in their career, as the paper would be rejected. 

Idea: Consider a dangerous paper which would otherwise have been published. What if that paper were published title-only on the NeurIPS website, so that the researchers can still get career capital?

Problem: How do you ensure resubmission doesn't occur elsewhere?

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2020-06-10T22:54:12.979Z · LW(p) · GW(p)

The people at NeurIPS who reviewed the paper might notice if resubmission occurred elsewhere? Automated tools might help with this, by searching for specific phrases.

There's been talk of having a Journal of Infohazards. Seems like an idea worth exploring to me. Your suggestion sounds like a much more feasible first step.

Problem: Any entity with halfway decent hacking skills (such as a national government, or clever criminal) would be able to peruse the list of infohazardy titles, look up the authors, cyberstalk them, and then hack into their personal computer and steal the files. We could hope that people would take precautions against this, but I'm not very optimistic. That said, this still seems better than the status quo.

comment by TurnTrout · 2020-05-26T18:01:51.657Z · LW(p) · GW(p)

Sentences spoken aloud are a latent space embedding of our thoughts; when trying to move a thought from our mind to another's, our thoughts are encoded with the aim of minimizing the other person's decoder error.

comment by TurnTrout · 2020-03-19T20:24:38.419Z · LW(p) · GW(p)

Broca’s area handles syntax, while Wernicke’s area handles the semantic side of language processing. Subjects with damage to the latter can speak in syntactically fluent jargon-filled sentences (fluent aphasia) – and they can’t even tell their utterances don’t make sense, because they can’t even make sense of the words leaving their own mouth!

It seems like GPT2 : Broca’s area :: ??? : Wernicke’s area. Are there any cog psych/AI theories on this?

comment by TurnTrout · 2019-12-12T17:14:12.363Z · LW(p) · GW(p)

Going through an intro chem textbook, it immediately strikes me how this should be as appealing and mysterious as the alchemical magic system of Fullmetal Alchemist. "The law of equivalent exchange" "conservation of energy/elements/mass (the last two holding only for normal chemical reactions)", etc. If only it were natural to take joy in the merely real...

Replies from: Hazard
comment by Hazard · 2019-12-12T17:39:17.386Z · LW(p) · GW(p)

Have you been continuing your self-study schemes into realms beyond math stuff? If so I'm interested in both the motivation and how it's going! I remember having little interest in other non-physics science growing up, but that was also before I got good at learning things and my enjoyment was based on how well it was presented.

Replies from: TurnTrout
comment by TurnTrout · 2019-12-12T17:55:54.030Z · LW(p) · GW(p)

Yeah, I've read a lot of books since my reviews fell off last year, most of them still math. I wasn't able to type reliably until early this summer, so my reviews kinda got derailed. I've read Visual Group Theory, Understanding Machine Learning, Computational Complexity: A Conceptual Perspective, Introduction to the Theory of Computation, An Illustrated Theory of Numbers, most of Tadellis' Game Theory, the beginning of Multiagent Systems, parts of several graph theory textbooks, and I'm going through Munkres' Topology right now. I've gotten through the first fifth of the first Feynman lectures, which has given me an unbelievable amount of mileage for generally reasoning about physics.

I want to go back to my reviews, but I just have a lot of other stuff going on right now. Also, I run into fewer basic confusions than when I was just starting at math, so I generally have less to talk about. I guess I could instead try and re-present the coolest concepts from the book.

My "plan" is to keep learning math until the low graduate level (I still need to at least do complex analysis, topology, field / ring theory, ODEs/PDEs, and something to shore up my atrocious trig skills, and probably more)[1], and then branch off into physics + a "softer" science (anything from microecon to psychology). CS ("done") -> math -> physics -> chem -> bio is the major track for the physical sciences I have in mind, but that might change. I dunno, there's just a lot of stuff I still want to learn. :)

  1. I also still want to learn Bayes nets, category theory, get a much deeper understanding of probability theory, provability logic, and decision theory. ↩︎

Replies from: Hazard
comment by Hazard · 2019-12-12T19:08:03.098Z · LW(p) · GW(p)

Yay learning all the things! Your reviews are fun, also completely understandable putting energy elsewhere. Your energy for more learning is very useful for periodically bouncing myself into more learning.

comment by TurnTrout · 2019-10-01T20:57:16.698Z · LW(p) · GW(p)

We can think about how consumers respond to changes in price by considering the elasticity of the quantity demanded at a given price - how quickly does demand decrease as we raise prices? Price elasticity of demand is defined as ; in other words, for price and quantity , this is (this looks kinda weird, and it wasn't immediately obvious what's happening here...). Revenue is the total amount of cash changing hands: .

What's happening here is that raising prices is a good idea when the revenue gained (the "price effect") outweighs the revenue lost to falling demand (the "quantity effect"). A lot of words so far for an easy concept:

If price elasticity is greater than 1, demand is inelastic and price hikes decrease revenue (and you should probably have a sale). However, if it's less than 1, demand is elastic and boosting the price increases revenue - demand isn't dropping off quickly enough to drag down the revenue. You can just look at the area of the revenue rectangle for each effect!

comment by TurnTrout · 2019-09-22T02:28:49.631Z · LW(p) · GW(p)

How does representation interact with consciousness? Suppose you're reasoning about the universe via a partially observable Markov decision process, and that your model is incredibly detailed and accurate. Further suppose you represent states as numbers, as their numeric labels.

To get a handle on what I mean, consider the game of Pac-Man, which can be represented as a finite, deterministic, fully-observable MDP. Think about all possible game screens you can observe, and number them. Now get rid of the game screens. From the perspective of reinforcement learning, you haven't lost anything - all policies yield the same return they did before, the transitions/rules of the game haven't changed - in fact, there's a pretty strong isomorphism I can show between these two MDPs. All you've done is changed the labels - representation means practically nothing to the mathematical object of the MDP, although many eg DRL algorithms should be able to exploit regularities in the representation to reduce sample complexity.

So what does this mean? If you model the world as a partially observable MDP whose states are single numbers... can you still commit mindcrime via your deliberations? Is the structure of the POMDP in your head somehow sufficient for consciousness to be accounted for (like how the theorems of complexity theory govern computers both of flesh and of silicon)? I'm confused.

Replies from: gworley, Vladimir_Nesov
comment by G Gordon Worley III (gworley) · 2019-09-23T17:31:16.845Z · LW(p) · GW(p)

I think a reasonable and related question we don't have a solid answer for is if humans are already capable of mind crime.

For example, maybe Alice is mad at Bob and imagines causing harm to Bob. How well does Alice have to model Bob for her imaginings to be mind crime? If Alice has low cognitive empathy is it not mind crime but if her cognitive empathy is above some level is it then mind crime?

I think we're currently confused enough about what mind crime is such that it's hard to even begin to know how we could answer these questions based on more than gut feelings.

comment by Vladimir_Nesov · 2019-09-22T05:55:58.633Z · LW(p) · GW(p)

I suspect that it doesn't matter how accurate or straightforward a predictor is in modeling people. What would make prediction morally irrelevant is that it's not noticed by the predicted people, irrespective of whether this happens because it spreads the moral weight conferred to them over many possibilities (giving inaccurate prediction), keeps the representation sufficiently baroque, or for some other reason. In the case of inaccurate prediction or baroque representation, it probably does become harder for the predicted people to notice being predicted, and I think this is the actual source of moral irrelevance, not those things on their own. A more direct way of getting the same result is to predict counterfactuals where the people you reason about don't notice the fact that you are observing them, which also gives a form of inaccuracy (imagine that your predicting them is part of their prior, that'll drive the counterfactual further from reality).

comment by TurnTrout · 2019-09-16T22:19:19.496Z · LW(p) · GW(p)

I seem to differently discount different parts of what I want. For example, I'm somewhat willing to postpone fun to low-probability high-fun futures, whereas I'm not willing to do the same with romance.

comment by TurnTrout · 2020-08-02T00:33:10.521Z · LW(p) · GW(p)

If you measure death-badness from behind the veil of ignorance, you’d naively prioritize well-liked, famous people with large families.

Replies from: Pattern
comment by Pattern · 2020-08-02T02:18:30.194Z · LW(p) · GW(p)

Would you prioritize the young from behind the veil of ignorance?

comment by TurnTrout · 2020-06-26T19:41:44.688Z · LW(p) · GW(p)

Idea: learn by making conjectures (math, physical, etc) and then testing them / proving them, based on what I've already learned from a textbook. 

Learning seems easier and faster when I'm curious about one of my own ideas.

Replies from: An1lam, rudi-c
comment by NaiveTortoise (An1lam) · 2020-06-27T14:19:53.202Z · LW(p) · GW(p)

For what it's worth, this is very true for me as well.

I'm also reminded of a story of Robin Hanson from Cryonics magazine:

Robin’s attraction to the more abstract ideas supporting various fields of interest was similarly shown in his approach – or rather, lack thereof – to homework. “In the last two years of college, I simply stopped doing my homework, and started playing with the concepts. I could ace all the exams, but I got a zero on the homework… Someone got scatter plots up there to convince people that you could do better on exams if you did homework.” But there was an outlier on that plot, courtesy of Robin, that said otherwise.

comment by Rudi C (rudi-c) · 2020-06-27T14:12:40.714Z · LW(p) · GW(p)

How do you estimate how hard your invented problems are?

comment by TurnTrout · 2019-09-30T00:29:52.406Z · LW(p) · GW(p)

I had an intuition that attainable utility preservation (RL but you maintain your ability to achieve other goals) points at a broader template for regularization. AUP regularizes the agent's optimal policy to be more palatable towards a bunch of different goals we may wish we had specified. I hinted at the end of Towards a New Impact Measure [LW · GW] that the thing-behind-AUP might produce interesting ML regularization techniques.

This hunch was roughly correct; Model-Agnostic Meta-Learning tunes the network parameters such that they can be quickly adapted to achieve low loss on other tasks (the problem of few-shot learning). The parameters are not overfit on the scant few data points to which the parameters are adapted, which is also interesting.

comment by TurnTrout · 2021-09-15T02:01:28.436Z · LW(p) · GW(p)

Idea: Expert prediction markets on predictions made by theories in the field, with $ for being a good predictor and lots of $ for designing and running a later-replicated experiment whose result the expert community strongly anti-predicted. Lots of problems with the plan, but surprisal-based compensation seems interesting and I haven't heard about it before. 

comment by TurnTrout · 2021-03-02T18:07:49.152Z · LW(p) · GW(p)

I'd like to see research exploring the relevance of intragenomic conflict to AI alignment research. Intragenomic conflict constitutes an in-the-wild example of misalignment, where conflict arises "within an agent" even though the agent's genes have strong instrumental incentives to work together (they share the same body). 

comment by TurnTrout · 2021-02-11T17:38:27.469Z · LW(p) · GW(p)

In an interesting parallel to John Wentworth's Fixing the Good Regulator Theorem [LW · GW], I have an MDP result that says: 

Suppose we're playing a game where I give you a reward function and you give me its optimal value function in the MDP. If you let me do this for  reward functions (one for each state in the environment), and you're able to provide the optimal value function for each, then you know enough to reconstruct the entire environment (up to isomorphism).

Roughly: being able to complete linearly many tasks in the state space means you have enough information to model the entire environment.

comment by TurnTrout · 2020-11-05T21:50:18.519Z · LW(p) · GW(p)

I read someone saying that ~half of the universes in a neighborhood of ours went to Trump. But... this doesn't seem right. Assuming Biden wins in the world we live in, consider the possible perturbations to the mental states of each voter. (Big assumption! We aren't thinking about all possible modifications to the world state. Whatever that means.)

Assume all 2020 voters would be equally affected by a perturbation (which you can just think of as a decision-flip for simplicity, perhaps). Since we're talking about a neighborhood ("worlds pretty close to ours"), each world-modification is limited to N decision flips (where N isn't too big).

  • There are combinatorially more ways for a race to be close (in popular vote) than for it to not be close. But we're talking perturbations, and so since we're assuming Biden wins in this timeline, he's still winning in most other timelines close to ours
    • I don't know whether the electoral college really changes this logic. If we only consider a single state (PA), then it probably doesn't?
  • I'm also going to imagine that most decision-flips didn't have too many downstream effects, but this depends on when the intervention takes place: if it's a week beforehand, maybe people announce changes-of-heart to their families? A lot to think about there. I'll just pretend like they're isolated because I don't feel like thinking about it that long, and it's insanely hard to play out all those effects.
  • Since these decision-flips are independent, you don't get any logical correlations: the fact that I randomly changed my vote, doesn't change how I expect people like me to vote. This is big.

Under my extremely simplified model, the last bullet is what makes me feel like most universes in our neighborhood were probably also Biden victories.

Replies from: Measure
comment by Measure · 2020-11-06T15:17:24.368Z · LW(p) · GW(p)

I think this depends on the distance considered. In worlds very very close to ours, the vast majority will have the same outcome as ours. As you increase the neighborhood size (I imagine this as considering worlds which diverged from ours more distantly in the past), Trump becomes more likely relative to Biden [edit: more likely than he is relative to Biden in more nearby worlds]. As you continue to expand, other outcomes start to have significant likelihood as well.

Replies from: TurnTrout
comment by TurnTrout · 2020-11-06T17:05:49.323Z · LW(p) · GW(p)

Why do you think that? How do you know that?

Replies from: Measure
comment by Measure · 2020-11-06T17:38:29.555Z · LW(p) · GW(p)

General intuition that "butterfly effect" is basically true, meaning that if a change occurs in a chaotic system, then the size of the downstream effects will tend to increase over time.

Edit: I don't have a good sense of how far back you would have to go to see meaningful change in outcome, just that the farther you go the more likely change becomes.

Replies from: TurnTrout
comment by TurnTrout · 2020-11-06T18:19:25.977Z · LW(p) · GW(p)

Sure, but why would those changes tend to favor Trump as you get outside of a small neighborhood? Like, why would Biden / (Biden or Trump win) < .5? I agree it would at least approach .5 as the neighborhood grows. I think. 

Replies from: Measure
comment by Measure · 2020-11-06T18:52:10.104Z · LW(p) · GW(p)

I think we're in agreement here. I didn't mean to imply that Trump would become more likely than Biden in absolute terms, just that the ratio Trump/Biden would increase.

comment by TurnTrout · 2020-10-16T18:24:48.881Z · LW(p) · GW(p)

Epistemic status: not an expert

Understanding Newton's third law, .

Consider the vector-valued velocity as a function of time, . Scale this by the object's mass and you get the momentum function over time. Imagine this momentum function wiggling around over time, the vector from the origin rotating and growing and shrinking.

The third law says that force is the derivative of this rescaled vector function - if an object is more massive, then the same displacement of this rescaled arrow is a proportionally smaller velocity modification, because of the rescaling!

And also, forces have opposite reactions (by conservation of momentum) and equal reactions (by conservation of energy).

comment by TurnTrout · 2020-09-27T22:52:55.668Z · LW(p) · GW(p)

What is "real"? I think about myself as a computation embedded in some other computation (i.e. a universe-history). I think "real" describes hypotheses about the environment where my computation lives. What should I think is real? That which an "ideal embedded reasoner" would assign high credence. However that works.

This sensibly suggests that Gimli-in-actual-Ea (LOTR) should believe he lives in Ea, and that Ea is real, even though it isn't our universe's Earth. Also, the notion accounts for indexical uncertainty by punting it to how embedded reasoning should work (a la radical probabilism), without being tautological. Also, it supports both the subjective nature of what one should call "real", and the notion of an actual out-there-somewhere shared reality (multiple computations can be embedded within the same universe-history).

comment by TurnTrout · 2020-07-17T23:01:50.436Z · LW(p) · GW(p)

Tricking AIDungeon's GPT-3 model into writing HPMOR:

You start reading Harry Potter and the Methods of Rationality by Eliezer Yudkowsky:

" "It said to me," said Professor Quirrell, "that it knew me, and that it would hunt me down someday, wherever I tried to hide." His face was rigid, showing no fright.
"Ah," Harry said. "I wouldn't worry about that, Professor Quirrell." It's not like Dementors can actually talk, or think; the structure they have is borrowed from your own mind and expectations...
Now everyone was giving him very strange looks. The Aurors were glancing nervously at each other, at the Dementor, at Harry.
And they stood directly before the Dementor's cage.
"They are wounds in the world," Harry said. "It's just a wild guess, but I'm guessing the one who said that was Godric Gryffindor."
"Yes..." said Dumbledore. "How did you know?"
It is a common misconception, thought Harry, that all the best rationalists are Sorted into Ravenclaw, leaving none for other Houses. This is not so; being Sorted into Ravenclaw indicates that your strongest virtue is curiosity, wondering and desiring to know the true answer. And this is not the only virtue a rationalist needs. Sometimes you have to work hard on a problem, and stick to it for a while. Sometimes you need a clever plan for finding out. And sometimes what you need more than anything else to see an answer, is the courage to face it...
Harry's gaze went to what lay beneath the cloak, the horror far worse than any decaying mummy. Rowena Ravenclaw might also have known, for it was an obvious enough riddle once you saw it as a riddle.
And it was also obvious why the Patronuses were animals. The animals didn't know, and so were sheltered from the fear.
But Harry knew, and would always know, and would never be able to forget. He'd tried to teach himself to face reality without flinching, and though Harry had not yet mastered that art, still those grooves had been worn into his mind, the learned reflex to look toward the painful thought instead of away. Harry would never be able to forget by thinking warm happy thoughts about something else, and that was why the spell hadn't worked for him.
So Harry would think a warm happy thought that wasn't about something else.
Harry drew forth his wand that Professor Flitwick had returned to him, put his feet into the beginning stance for the Patronus Charm.
Within his mind, Harry discarded the last remnants of the peace of the phoenix, put aside the calm, the dreamlike state, remembered instead Fawkes's piercing cry, and roused himself for battle. Called upon all the pieces and elements of himself to awaken. Raised up within himself all the strength that the Patronus Charm could ever draw upon, to put himself into the right frame of mind for the final warm and happy thought; remembered all bright things.
The books his father had bought him.
Mum's smile when Harry had handmade her a mother's day card, an elaborate thing that had used half a pound of spare electronics parts from the garage to flash lights and beep a little tune, and had taken him three days to make.
Professor McGonagall telling him that his parents had died well, protecting him. As they had.
Realizing that Hermione was keeping up with him and even running faster, that they could be true rivals and friends.
Coaxing Draco out of the darkness, watching him slowly move toward the light.
Neville and Seamus and Lavender and Dean and everyone else who looked up to him, everyone that he would have fought to protect if anything threatened Hogwarts.
Everything that made life worth living.
His wand rose into the starting position for the Patronus Charm.
Harry thought of the stars, the image that had almost held off the Dementor even without a Patronus. Only this time, Harry added the missing ingredient, he'd never truly seen it but he'd seen the pictures and the video....

Harry thought of the image of his family, Mum and Dad and Liliana, standing outside the house on Gardiners Street, smiling at him. He thought of them in the back garden of that house, laughing as he chased the gnomes, and that was his final warm and happy thought.
And Harry cast the Patronus Charm.
A burst of silver erupted from the end of his wand and took the form of a deer, shining brightly.
...His mother's eyes.
And the final piece clicked into place.
Harry's mind was flooded with warmth, and the world seemed to still. Time itself seemed to stop, sound itself ceased to be of concern to Harry, all that remained was himself, and the terrible terrible void that surrounded him. But even as he felt himself begin to be pulled into it, a new force intervened.
A new shape of white, a beautiful silver serpent, coiled around him, protecting him from the darkness. The beautiful creature seemed to both give off warmth and energy, while also absorbing it from the void itself.
Harry felt himself pulled backward, away from the Dementor, and he found himself in the dark halls of Azkaban, being shielded by his wonderful silver Patronus.
...He thought of the night sky, of the moon, shining down upon him and his friends, and the stars beyond. His wand began to glow. "Expecto Patronum!" The glow burst from the tip of his wand and formed into a shape, and the shape was glowing white like a shadowless bright light, yet it had form and shape and figure, and that shape was the shadow of the memory of a star.
...He thought of the sunlight. Of warm air and a bright sky. Of happy people and happy times. Of hope and joy. Of a future worth living for, not just worth dying for. And Harry's wand began to glow. And then the golden stag erupted from the tip of his wand, and charged the Dementor with a cry that echoed through the tunnel.

[...] Only this time, Harry added the missing ingredient,
...the reason to fight for. "Everything I love," Harry hissed out, "everything I love, Reflect if you're worth saving." And with that, the first glimmer of white silver started appearing in the air. "What is that?" whispered Hermione. "My Patronus," Harry said softly. "It's not working." The silver was fading, like a bubble that threatened to burst. "I can't cast the Patronus Charm..."

...He imagined the stars as they looked from Earth, imagined his true home, thought of it being there with him, holding off the Dementor. [?!]

The cold was already fading. That wouldn't help against this fear, but another warm memory would. Harry brought to mind a summer's day in primary school, where he'd first learned to cast a spell.
... the most important part. His father, and his mother, waving to him from a dock as a ship bound for Egypt sailed away. Harry filled his mind with the happy thought, let it overtake him, let it become him.

Let it carry him, like an airship on a favorable wind. "Expecto Patronum!" A burst of pure light, the blazing shadow of an immensely tall humanoid, armored and bearing a helm and a sword. [?!]

... a wish beyond the thought, a belief that he could not truly be saved. "Don't die," Harry whispered to himself, as he had when facing Quirrell, a prayer to whatever Gods might be listening, an order to himself rather than request of them. "Don't die, don't die, don't die..." Concentrating solely upon that wish, the image of a star exploding into existence upon his outstretched wand.

... that he'd been too sad to think of before. His mother's love. He had to believe. He had to believe, even if she wasn't really there, that she loved him. He had to believe, or else the Patronus wouldn't come. A few feet away from him, Draco fell to his knees. "No," he whispered. "No!"
That was what broke Harry's concentration. In front of him, the shimmering humanoid shape collapsed, the mist and light fading away. There was silence for a moment. Then a howl of mad laughter shook the chamber, as Mother's face contorted with rage. "Fool! Idiot! You thought you could beat me, with such a pitiful attempt!" Her voice was rising to a shriek.

...the reason why it had been almost enough, and not quite. Hope. I have seen the brightest futures of the world, and I have seen them swept away. "Expecto Patronum!" From the tip of his wand burst a shining Unicorn, white as snow and bright as the sun.
And for just one heartbeat it stood there, before the world went black.

The Dark Lord is gone. The Ministry has fallen. London is burning. But you have a mission, and you will see it through, because this is the last light in the darkness. This is the last hope for the world.

And you will see it done.

Replies from: Pattern, TurnTrout, habryka4
comment by Pattern · 2020-07-19T00:16:46.256Z · LW(p) · GW(p)

I love the ending. It's way more exciting,

and terrifying.

comment by TurnTrout · 2020-07-18T15:39:29.058Z · LW(p) · GW(p)

... that which he had thought was absent. Love. He didn't think of the books, or his parents or Professor McGonagall. He thought of Hermione, and how she had always believed in him. He thought of how she'd helped him in so many ways, not just with homework, not just with fighting the Dark Arts. How she'd tried to help him every day since they'd first met on the Hogwarts Express.

comment by habryka (habryka4) · 2020-07-18T01:18:56.707Z · LW(p) · GW(p)

Mod note: Spoilerified, to shield the eyes of the innocent.

Replies from: TurnTrout
comment by TurnTrout · 2020-07-18T01:34:34.938Z · LW(p) · GW(p)

My bad! Thanks.

comment by TurnTrout · 2020-07-13T13:20:18.523Z · LW(p) · GW(p)

ARCHES distinguishes between single-agent / single-user and single-agent/multi-user alignment scenarios. Given assumptions like "everyone in society is VNM-rational" and "societal preferences should also follow VNM rationality", and "if everyone wants a thing, society also wants the thing", Harsanyi's utilitarian theorem shows that the societal utility function is a linear non-negative weighted combination of everyone's utilities. So, in a very narrow (and unrealistic) setting, Harsanyi's theorem tells you how the single-multi solution is built from the single-single solutions. 

This obviously doesn't actually solve either alignment problem. But, it seems like an interesting parallel for what we might eventually want.

comment by TurnTrout · 2020-05-13T15:25:36.291Z · LW(p) · GW(p)

From FLI's AI Alignment Podcast: Inverse Reinforcement Learning and Inferring Human Preferences with Dylan Hadfield-Menell:

Dylan: There’s one example that I think about, which is, say, you’re cooperating with an AI system playing chess. You start working with that AI system, and you discover that if you listen to its suggestions, 90% of the time, it’s actually suggesting the wrong move or a bad move. Would you call that system value-aligned?

Lucas: No, I would not.

Dylan: I think most people wouldn’t. Now, what if I told you that that program was actually implemented as a search that’s using the correct goal test? It actually turns out that if it’s within 10 steps of a winning play, it always finds that for you, but because of computational limitations, it usually doesn’t. Now, is the system value-aligned? I think it’s a little harder to tell here. What I do find is that when I tell people the story, and I start off with the search algorithm with the correct goal test, they almost always say that that is value-aligned but stupid.

There’s an interesting thing going on here, which is we’re not totally sure what the target we’re shooting for is. You can take this thought experiment and push it further. Supposed you’re doing that search, but, now, it says it’s heuristic search that uses the correct goal test but has an adversarially chosen heuristic function. Would that be a value-aligned system? Again, I’m not sure. If the heuristic was adversarially chosen, I’d say probably not. If the heuristic just happened to be bad, then I’m not sure.

Consider the optimizer/optimized distinction: the AI assistant is better described as optimized to either help or stop you from winning the game. This optimization may or may not have been carried out by a process which is "aligned" with you; I think that ascribing intent alignment to the assistant's creator makes more sense. In terms of the adversarial heuristic case, intent alignment seems unlikely.

But, this also feels like passing the buck – hoping that at some point in history, there existed something to which we are comfortable ascribing alignment and responsibility.

comment by TurnTrout · 2020-05-06T17:19:57.756Z · LW(p) · GW(p)

On page 22 of Probabilistic reasoning in intelligent systems, Pearl writes:

Raw experiential data is not amenable to reasoning activities such as prediction and planning; these require that data be abstracted into a representation with a coarser grain. Probabilities are summaries of details lost in this abstraction...

An agent observes a sequence of images displaying either a red or a blue ball. The balls are drawn according to some deterministic rule of the time step. Reasoning directly from the experiential data leads to ~Solomonoff induction. What might Pearl's "coarser grain" look like for a real agent?

Imagine an RNN trained with gradient descent and binary cross-entropy loss function ("given the data so far, did it correctly predict the next draw?"), and suppose the learned predictive accuracy is good. How might this happen?

  1. The network learns to classify whether the most recent input image contains a red or blue ball, for instrumental predictive reasons, and

  2. A recurrent state records salient information about the observed sequence, which could be arbitrarily long. The RNN + learned weights form a low-complexity function approximator in the space of functions on arbitrary-length sequences. My impression is that gradient descent has simplicity as an inductive bias (cf double descent debate).

Being an approximation of some function over arbitrary-length sequences, the network outputs a prediction for the next color, a specific feature of the next image in the sequence. Can this prediction be viewed as nontrivially probabilistic? In other words, could we use the output to learn about the network's "beliefs" over hypotheses which generate the sequence of balls?

The RNN probably isn't approximating the true (deterministic) hypothesis which explains the sequence of balls. Since it's trained to minimize cross-entropy loss, it learns to hedge, essentially making it approximate a distribution over hypotheses. This implicitly defines its "posterior probability distribution".

Under this interpretation, the output is just the measure of hypotheses predicting blue versus the measure predicting red.

Replies from: TurnTrout, An1lam
comment by TurnTrout · 2020-05-06T17:37:19.042Z · LW(p) · GW(p)

In particular, the coarse-grain is what I mentioned in 1) – beliefs are easier to manage with respect to a fixed featurization of the observation space.

comment by NaiveTortoise (An1lam) · 2020-05-06T19:43:08.114Z · LW(p) · GW(p)

Only related to the first part of your post, I suspect Pearl!2020 would say the coarse-grained model should be some sort of causal model on which we can do counterfactual reasoning.

comment by TurnTrout · 2020-04-28T19:23:47.063Z · LW(p) · GW(p)

We can imagine aliens building a superintelligent agent which helps them get what they want. This is a special case of aliens inventing tools. What kind of general process should these aliens use – how should they go about designing such an agent?

Assume that these aliens want things in the colloquial sense (not that they’re eg nontrivially VNM EU maximizers) and that a reasonable observer would say they’re closer to being rational than antirational. Then it seems[1] like these aliens eventually steer towards reflectively coherent rationality (provided they don’t blow themselves to hell before they get there): given time, they tend to act to get what they want, and act to become more rational. But, they aren’t fully “rational”, and they want to build a smart thing that helps them. What should they do?

In this situation, it seems like they should build an agent which empowers them & increases their flexible control over the future, since they don’t fully know what they want now. Lots of flexible control means they can better error-correct and preserve value for what they end up believing they actually want. This also protects them from catastrophe and unaligned competitor agents.

  1. I don’t know if this is formally and literally always true, I’m just trying to gesture at an intuition about what kind of agentic process these aliens are. ↩︎

comment by TurnTrout · 2020-04-15T22:12:03.952Z · LW(p) · GW(p)

ordinal preferences just tell you which outcomes you like more than others: apples more than oranges.

Interval scale preferences assign numbers to outcomes, which communicates how close outcomes are in value: kiwi 1, orange 5, apple 6. You can say that apples have 5 times the advantage over kiwis that they do over oranges, but you can't say that apples are six times as good as kiwis. Fahrenheit and Celsius are also like this.

Ratio scale ("rational"? 😉) preferences do let you say that apples are six times as good as kiwis, and you need this property to maximize expected utility. You have to be able to weigh off the relative desirability of different outcomes, and ratio scale is the structure which let you do it – the important content of a utility function isn't in its numerical values, but in the ratios of the valuations.

Replies from: mr-hire, Dagon
comment by Matt Goldenberg (mr-hire) · 2020-04-16T14:54:35.550Z · LW(p) · GW(p)

Isn't the typical assumption in game theory that preferences are ordinal? This suggests that you can make quite a few strategic decisions without bringing in ratio.

comment by Dagon · 2020-04-16T18:09:32.389Z · LW(p) · GW(p)

From what I have read, and from self-introspection, humans mostly have ordinal preferences. Some of them we can interpolate to interval scales or ratios (or higher-order functions) but if we extrapolate very far, we get odd results.

It turns out you can do a LOT with just ordinal preferences. Almost all real-world decisions are made this way.

comment by TurnTrout · 2020-03-26T23:51:48.935Z · LW(p) · GW(p)

It seems to me that Zeno's paradoxes leverage incorrect, naïve notions of time and computation. We exist in the world, and we might suppose that that the world is being computed in some way. If time is continuous, then the computer might need to do some pretty weird things to determine our location at an infinite number of intermediate times. However, even if that were the case, we would never notice it – we exist within time and we would not observe the external behavior of the system which is computing us, nor its runtime.

Replies from: Pattern
comment by Pattern · 2020-03-28T06:18:45.520Z · LW(p) · GW(p)

What are your thoughts on infinitely small quantities?

Replies from: TurnTrout
comment by TurnTrout · 2020-03-28T13:14:46.395Z · LW(p) · GW(p)

Don't have much of an opinion - I haven't rigorously studied infinitesimals yet. I usually just think of infinite / infinitely small quantities as being produced by limiting processes. For example, the intersection of all the -balls around a real number is just that number (under the standard topology), which set has 0 measure and is, in a sense, "infinitely small".

comment by TurnTrout · 2020-03-18T18:36:37.285Z · LW(p) · GW(p)

Very rough idea

In 2018, I started thinking about corrigibility as "being the kind of agent lots of agents would be happy to have activated". This seems really close to a more ambitious version of what AUP tries to do (not be catastrophic for most agents).

I wonder if you could build an agent that rewrites itself / makes an agent which would tailor the AU landscape towards its creators' interests, under a wide distribution of creator agent goals/rationalities/capabilities. And maybe you then get a kind of generalization, where most simple algorithms which solve this solve ambitious AI alignment in full generality.

comment by TurnTrout · 2020-02-06T16:56:15.771Z · LW(p) · GW(p)

My autodidacting has given me a mental reflex which attempts to construct a gears-level explanation of almost any claim I hear. For example, when listening to “Listen to Your Heart” by Roxette:

Listen to your heart,

There’s nothing else you can do

I understood what she obviously meant and simultaneously found myself subvocalizing “she means all other reasonable plans are worse than listening to your heart - not that that’s literally all you can do”.

This reflex is really silly and annoying in the wrong context - I’ll fix it soon. But it’s pretty amusing that this is now how I process claims by default, and I think it usually serves me well.

comment by TurnTrout · 2020-02-05T16:41:01.779Z · LW(p) · GW(p)

AFAICT, the deadweight loss triangle from eg price ceilings is just a lower bound on lost surplus. inefficient allocation to consumers means that people who value good less than market equilibrium price can buy it, while dwl triangle optimistically assumes consumers with highest willingness to buy will eat up the limited supply.

Replies from: Wei_Dai, Dagon
comment by Wei_Dai · 2020-02-07T22:14:31.405Z · LW(p) · GW(p)

Good point. By searching for "deadweight loss price ceiling lower bound" I was able to find a source (see page 26) that acknowledges this, but most explications of price ceilings do not seem to mention that the triangle is just a lower bound for lost surplus.

comment by Dagon · 2020-02-07T00:02:40.045Z · LW(p) · GW(p)

Lost surplus is definitely a loss - it's not linear with utility, but it's not uncorrelated. Also, if supply is elastic over any relevant timeframe, there's an additional source of loss. And I'd argue that for most goods, over timeframes smaller than most price-fixing proposals are expected to last, there is significant price elasticity.

Replies from: TurnTrout
comment by TurnTrout · 2020-02-07T13:10:46.833Z · LW(p) · GW(p)

Lost surplus is definitely a loss - it's not linear with utility, but it's not uncorrelated.

I don't think I was disagreeing?

Replies from: Dagon
comment by Dagon · 2020-02-07T17:13:50.111Z · LW(p) · GW(p)

Ah, I took the "just" in "just a lower bound on lost surplus" as an indicator that it's less important than other factors. And I lightly believe (meaning: for the cases I find most available, I believe it, but I don't know how general it is) that the supply elasticity _is_ the more important effect of such distortions.

So I wanted to reinforce that I wasn't ignoring that cost, only pointing out a greater cost.

comment by TurnTrout · 2019-12-26T16:51:41.378Z · LW(p) · GW(p)

The framing effect & aversion to losses generally cause us to execute more cautious plans. I’m realizing this is another reason to reframe my x-risk motivation from “I won’t let the world be destroyed” to “there’s so much fun we could have, and I want to make sure that happens”. I think we need more exploratory thinking in alignment research right now.

(Also, the former motivation style led to me crashing and burning a bit when my hands were injured and I was no longer able to do much.)

ETA: actually, i’m realizing I had the effect backwards. Framing via losses actually encourages more risk-taking plans. Oops. I’d like to think about this more, since I notice my model didn’t protest when I argued the opposite of the experimental conclusions.

Replies from: TurnTrout, Isnasene
comment by TurnTrout · 2019-12-26T19:49:07.630Z · LW(p) · GW(p)

I’m realizing how much more risk-neutral I should be:

Paul Samuelson... offered a colleague a coin-toss gamble. If the colleague won the coin toss, he would receive $200, but if he lost, he would lose $100. Samuelson was offering his colleague a positive expected value with risk. The colleague, being risk-averse, refused the single bet, but said that he would be happy to toss the coin 100 times! The colleague understood that the bet had a positive expected value and that across lots of bets, the odds virtually guaranteed a profit. Yet with only one trial, he had a 50% chance of regretting taking the bet.

Notably, Samuelson‘s colleague doubtless faced many gambles in life… He would have fared better in the long run by maximizing his expected value on each decision... all of us encounter such “small gambles” in life, and we should try to follow the same strategy. Risk aversion is likely to tempt us to turn down each individual opportunity for gain. Yet the aggregated risk of all of the positive expected value gambles that we come across would eventually become infinitesimal, and potential profit quite large.

comment by Isnasene · 2019-12-27T01:36:36.046Z · LW(p) · GW(p)

For what it's worth, I tried something like the "I won't let the world be destroyed"->"I want to make sure the world keeps doing awesome stuff" reframing back in the day and it broadly didn't work. This had less to do with cautious/uncautious behavior and more to do with status quo bias. Saying "I won't let the world be destroyed" treats "the world being destroyed" as an event that deviates from the status quo of the world existing. In contrast, saying "There's so much fun we could have" treats "having more fun" as the event that deviates from the status quo of us not continuing to have fun.

When I saw the world being destroyed as status quo, I cared a lot less about the world getting destroyed.

comment by TurnTrout · 2019-11-26T18:43:56.755Z · LW(p) · GW(p)

I was having a bit of trouble holding the point of quadratic residues in my mind. I could effortfully recite the definition, give an example, and walk through the broad-strokes steps of proving quadratic reciprocity. But it felt fake and stale and memorized.

Alex Mennen suggested a great way of thinking about it. For some odd prime , consider the multiplicative group . This group is abelian and has even order . Now, consider a primitive root / generator . By definition, every element of the group can be expressed as . The quadratic residues are those expressible by even (this is why, for prime numbers, half of the group is square mod ). This also lets us easily see that the residual subgroup is closed under multiplication by (which generates it), that two non-residues multiply to make a residue, and that a residue and non-residue make a non-residue. The Legendre symbol then just tells us, for , whether is even.

Now, consider composite numbers whose prime decomposition only contains or in the exponents. By the fundamental theorem of finite abelian groups and the chinese remainder theorem, we see that a number is square mod iff it is square mod all of the prime factors.

I'm still a little confused about how to think of squares mod .

Replies from: AlexMennen
comment by AlexMennen · 2019-11-26T23:28:49.532Z · LW(p) · GW(p)

The theorem: where is relatively prime to an odd prime and , is a square mod iff is a square mod and is even.

The real meat of the theorem is the case (i.e. a square mod that isn't a multiple of is also a square mod . Deriving the general case from there should be fairly straightforward, so let's focus on this special case.

Why is it true? This question has a surprising answer: Newton's method for finding roots of functions. Specifically, we want to find a root of , except in instead of .

To adapt Newton's method to work in this situation, we'll need the p-adic absolute value on : for relatively prime to . This has lots of properties that you should expect of an "absolute value": it's positive ( with only when ), multiplicative (), symmetric (), and satisfies a triangle inequality (; in fact, we get more in this case: ). Because of positivity, symmetry, and the triangle inequality, the p-adic absolute value induces a metric (in fact, ultrametric, because of the strong version of the triangle inequality) . To visualize this distance function, draw giant circles, and sort integers into circles based on their value mod . Then draw smaller circles inside each of those giant circles, and sort the integers in the big circle into the smaller circles based on their value mod . Then draw even smaller circles inside each of those, and sort based on value mod , and so on. The distance between two numbers corresponds to the size of the smallest circle encompassing both of them. Note that, in this metric, converges to .

Now on to Newton's method: if is a square mod , let be one of its square roots mod . ; that is, is somewhat close to being a root of with respect to the p-adic absolute value. , so ; that is, is steep near . This is good, because starting close to a root and the slope of the function being steep enough are things that helps Newton's method converge; in general, it might bounce around chaotically instead. Specifically, It turns out that, in this case, is exactly the right sense of being close enough to a root with steep enough slope for Newton's method to work.

Now, Newton's method says that, from , you should go to . is invertible mod , so we can do this. Now here's the kicker: , so . That is, is closer to being a root of than is. Now we can just iterate this process until we reach with , and we've found our square root of mod .

Exercise: Do the same thing with cube roots. Then with roots of arbitrary polynomials.

Replies from: AlexMennen
comment by AlexMennen · 2019-11-26T23:37:25.091Z · LW(p) · GW(p)

The part about derivatives might have seemed a little odd. After all, you might think, is a discrete set, so what does it mean to take derivatives of functions on it. One answer to this is to just differentiate symbolically using polynomial differentiation rules. But I think a better answer is to remember that we're using a different metric than usual, and isn't discrete at all! Indeed, for any number , , so no points are isolated, and we can define differentiation of functions on in exactly the usual way with limits.

comment by TurnTrout · 2019-10-22T19:22:03.859Z · LW(p) · GW(p)

I noticed I was confused and liable to forget my grasp on what the hell is so "normal" about normal subgroups. You know what that means - colorful picture time!

First, the classic definition. A subgroup is normal when, for all group elements , (this is trivially true for all subgroups of abelian groups).

ETA: I drew the bounds a bit incorrectly; is most certainly within the left coset ().

Notice that nontrivial cosets aren't subgroups, because they don't have the identity .

This "normal" thing matters because sometimes we want to highlight regularities in the group by taking a quotient. Taking an example from the excellent Visual Group Theory, the integers have a quotient group consisting of the congruence classes , each integer slotted into a class according to its value mod 12. We're taking a quotient with the cyclic subgroup .

So, what can go wrong? Well, if the subgroup isn't normal, strange things can happen when you try to take a quotient.

Here's what's happening:

Normality means that when you form the new Cayley diagram, the arrows behave properly. You're at the origin, . You travel to using . What we need for this diagram to make sense is that if you follow any you please, applying means you go back to . In other words, . In other words, . In other other words (and using a few properties of groups), .

comment by TurnTrout · 2019-09-30T00:45:44.638Z · LW(p) · GW(p)

One of the reasons I think corrigibility might have a simple core principle is: it seems possible to imagine a kind of AI which would make a lot of different possible designers happy. That is, if you imagine the same AI design deployed by counterfactually different agents with different values and somewhat-reasonable rationalities, it ends up doing a good job by almost all of them. It ends up acting to further the designers' interests in each counterfactual. This has been a useful informal way for me to think about corrigibility, when considering different proposals.

This invariance also shows up (in a different way) in AUP, where the agent maintains its ability to satisfy many different goals. In the context of long-term safety, AUP agents are designed to avoid gaining power, which implicitly ends up respecting the control of other agents present in the environment (no matter their goals).

I'm interested in thinking more about this invariance, and why it seems to show up in a sensible way in two different places.

comment by TurnTrout · 2020-05-06T15:46:45.897Z · LW(p) · GW(p)

Continuous functions can be represented by their rational support; in particular, for each real number , choose a sequence of rational numbers converging to , and let .

Therefore, there is an injection from the vector space of continuous functions to the vector space of all sequences : since the rationals are countable, enumerate them by . Then the sequence represents continuous function .

Replies from: itaibn0
comment by itaibn0 · 2020-05-06T20:30:33.088Z · LW(p) · GW(p)

This map is not a surjection because not every map from the rational numbers to the real numbers is continuous, and so not every sequence represents a continuous function. It is injective, and so it shows that a basis for the latter space is at least as large in cardinality as a basis for the former space. One can construct an injective map in the other direction, showing the both spaces of bases with the same cardinality, and so they are isomorphic.

Replies from: TurnTrout
comment by TurnTrout · 2020-05-06T20:59:50.419Z · LW(p) · GW(p)

Fixed, thanks.

comment by TurnTrout · 2019-10-01T02:44:33.918Z · LW(p) · GW(p)

(Just starting to learn microecon, so please feel free to chirp corrections)

How diminishing marginal utility helps create supply/demand curves: think about the uses you could find for a pillow. Your first few pillows are used to help you fall asleep. After that, maybe some for your couch, and then a few spares to keep in storage. You prioritize pillow allocation in this manner; the value of the latter uses is much less than the value of having a place to rest your head.

How many pillows do you buy at a given price point? Well, if you buy any, you'll buy some for your bed at least. Then, when pillows get cheap enough, you'll start buying them for your couch. At what price, exactly? Depends on the person, and their utility function. So as the price goes up or down, it does or doesn't become worth it to buy pillows for different levels of the "use hierarchy".

Then part of what the supply/demand curve is reflecting is the distribution of pillow use valuations in the market. It tracks when different uses become worth it for different agents, and how significant these shifts are!

comment by TurnTrout · 2021-09-21T17:38:03.780Z · LW(p) · GW(p)

Does anyone have tips on how to buy rapid tests in the US? Not seeing any on US Amazon, not seeing any in person back where I'm from. Considering buying German tests. Even after huge shipping costs, it'll come out to ~$12 a test, which is sadly competitive with US market prices.

Wasn't able to easily find tests on the Mexican and Canadian Amazon websites, and other EU countries don't seem to have them either. 

Replies from: Markas
comment by Markas · 2021-09-22T17:48:00.329Z · LW(p) · GW(p)

I've been able to buy from the CVS website several times in the past couple months, and even though they're sold out online now, they have some (sparse) in-store availability listed.  Worth checking there, Walgreens, etc. periodically.

comment by TurnTrout · 2021-05-14T22:46:01.681Z · LW(p) · GW(p)

The Baldwin effect

I couldn't find great explanations online, so here's my explanation after a bit of Googling. I welcome corrections from real experts.

Organisms exhibit phenotypic plasticity when they act differently in different environments. The phenotype (manifested traits: color, size, etc) manifests differently, even though two organisms might share the same genotype (genetic makeup). 

Panel 1: organisms are not phenotypically plastic and do not adapt to a spider-filled environment. Panel 2: a plastic organism might do the bad thing, and then learn not to do it again, increasing its fitness. Panel 3: genetic assimilation hard-codes important commonly learned lessons, cutting out the costs of learning.

The most obvious example of phenotypic plasticity is learning. Learning lets you adapt to environments where your genotype would otherwise do poorly. 

The Baldwin effect is: evolution selects for genome-level hardcoding of extremely important learned lessons. This might even look like, the mom learned not to eat spiders, but her baby is born knowing not to eat spiders. 

(No, it's not Lamarckian inheritance. That's still wrong.)

Replies from: Robbo
comment by Robbo · 2021-05-15T03:47:17.630Z · LW(p) · GW(p)

[disclaimer: not an expert, possibly still confused about the Baldwin effect]

A bit of feedback on this explanation: as written, it didn’t make clear to me what makes it a special effect. “Evolution selects for genome-level hardcoding of extremely important learned lessons.” As a reader I was like, what makes this a special case? If it’s useful lesson then of course evolution would tend to select for knowing it innately - that does seem handy for an organism.

As I understand it, what is interesting about the Baldwin effect is that such hard coding is selected for more among creatures that can learn, and indeed because of learning. The learnability of the solution makes it even more important to be endowed with the solution. So individual learning, in this way, drives selection pressures. Dennett’s explanation emphasizes this - curious what you make of his?


Replies from: TurnTrout
comment by TurnTrout · 2021-05-15T16:37:15.893Z · LW(p) · GW(p)

As a reader I was like, what makes this a special case? If it’s useful lesson then of course evolution would tend to select for knowing it innately - that does seem handy for an organism.

Right, I wondered this as well. I had thought its significance was that the effect seemed Lamarckian, but it wasn't. (And, I confess, I made the parent comment partly hoping that someone would point out that I'd missed the key significance of the Baldwin effect. As the joke goes, the fastest way to get your paper spell-checked is to comment it on a YouTube video!)

curious what you make of his?

Thanks for this link. One part which I didn't understand is why closeness in learning-space (given your genotype, you're plastic enough to learn to do something) must imply that you're close in genotype-space (evolution has a path of local improvements which implement genetic assimilation of the plastic advantage). I can learn to program computers. Does that mean that, given the appropriate selection pressures, my descendents would learn to program computers instinctively? In a reasonable timeframe?

It's not that I can't imagine such evolution occurring. It just wasn't clear why these distance metrics should be so strongly related.

Reading the link, Dennett points out this assumption and discusses why it might be reasonable, and how we might test it.

comment by TurnTrout · 2021-02-17T16:53:49.338Z · LW(p) · GW(p)

I went into a local dentist's office to get more prescription toothpaste; I was wearing my 3M p100 mask (with a surgical mask taped over the exhaust, in order to protect other people in addition to the native exhaust filtering offered by the mask). When I got in, the receptionist was on the phone. I realized it would be more sensible for me to wait outside and come back in, but I felt a strange reluctance to do so. It would be weird and awkward to leave right after entering. I hovered near the door for about 5 seconds before actually leaving. I was pretty proud that I was able to override my naive social instincts in a situation where they really didn't make sense (I will never see that person again, and they would probably see my minimizing shared air as courteous anyways), to both my benefit and the receptionist's.

Also, p100 masks are amazing! When I got home, I used hand sanitizer. I held my sanitized hands right up to the mask filters, but I couldn't even sniff a trace of the alcohol. When the mask came off, the alcohol slammed into my nose immediately.

comment by TurnTrout · 2021-01-14T15:50:43.927Z · LW(p) · GW(p)

(This is a basic point on conjunctions, but I don't recall seeing its connection to Occam's razor anywhere)

When I first read Occam's Razor [LW · GW] back in 2017, it seemed to me that the essay only addressed one kind of complexity: how complex the laws of physics are. If I'm not sure whether the witch did it, the universes where the witch did it are more complex, and so these explanations are exponentially less likely under a simplicity prior. Fine so far.

But there's another type. Suppose I'm weighing whether the United States government is currently engaged in a vast conspiracy to get me to post this exact comment? This hypothesis doesn't really demand a more complex source code, but I think we'd say that Occam's razor shaves away this hypothesis anyways - even before weighing object-level considerations. This hypothesis is complex in a different way: it's highly conjunctive in its unsupported claims about the current state of the world. Each conjunct eliminates many ways it could be true, from my current uncertainty, and so should I deem it correspondingly less likely.

Replies from: steve2152
comment by Steven Byrnes (steve2152) · 2021-01-14T22:05:06.440Z · LW(p) · GW(p)

I agree with the principle but I'm not sure I'd call it "Occam's razor". Occam's razor is a bit sketchy, it's not really a guarantee of anything, it's not a mathematical law, it's like a rule of thumb or something. Here you have a much more solid argument: multiplying many probabilities into a conjunction makes the result smaller and smaller. That's a mathematical law, rock-solid. So I'd go with that...

Replies from: TurnTrout
comment by TurnTrout · 2021-01-14T22:08:25.494Z · LW(p) · GW(p)

My point was more that "people generally call both of these kinds of reasoning 'Occam's razor', and they're both good ways to reason, but they work differently."

Replies from: steve2152
comment by Steven Byrnes (steve2152) · 2021-01-14T22:18:42.202Z · LW(p) · GW(p)

Oh, hmm, I guess that's fair, now that you mention it I do recall hearing a talk where someone used "Occam's razor" to talk about the solomonoff prior. Actually he called it "Bayes Occam's razor" I think. He was talking about a probabilistic programming algorithm.

That's (1) not physics, and (2) includes (as a special case) penalizing conjunctions, so maybe related to what you said. Or sorry if I'm still not getting what you meant

comment by TurnTrout · 2021-01-02T03:59:05.155Z · LW(p) · GW(p)

Instead of waiting to find out you were confused about new material you learned, pre-emptively google things like "common misconceptions about [concept]" and put the answers in your spaced repetition system, or otherwise magically remember them.

comment by TurnTrout · 2020-12-09T19:58:53.989Z · LW(p) · GW(p)

At a poster session today, I was asked how I might define "autonomy" from an RL framing; "power" is well-definable in RL, and the concepts seem reasonably similar. 

I think that autonomy is about having many ways to get what you want. If your attainable utility is high, but there's only one trajectory which really makes good things happen, then you're hemmed-in and don't have much of a choice. But if you have many policies which make good things happen, you have a lot of slack and you have a lot of choices. This would be a lot of autonomy.

This has to be subjectively defined for embedded agency reasons, and so the attainable utility / policie are computed with respect to the agent/environment abstraction you use to model yourself in the world.

comment by TurnTrout · 2020-12-09T18:58:43.989Z · LW(p) · GW(p)

In Markov decision processes, state-action reward functions seem less natural to me than state-based reward functions, at least if they assign different rewards to equivalent actions. That is, actions  at a state  can have different reward  even though they induce the same transition probabilities: . This is unappealing because the actions don't actually have a "noticeable difference" from within the MDP, and the MDP is visitation-distribution-isomorphic to an MDP without the action redundancy.

comment by TurnTrout · 2020-10-21T02:37:18.929Z · LW(p) · GW(p)
From unpublished work.

The answer to this seems obvious in isolation: shaping helps with credit assignment, rescaling doesn't (and might complicate certain methods in the advantage vs Q-value way). But I feel like maybe there's an important interaction here that could inform a mathematical theory of how a reward signal guides learners through model space?

comment by TurnTrout · 2020-09-26T22:24:17.184Z · LW(p) · GW(p)

Reasoning about learned policies via formal theorems on the power-seeking incentives of optimal policies

One way instrumental subgoals might arise in actual learned policies: we train a proto-AGI reinforcement learning agent with a curriculum including a variety of small subtasks. The current theorems show sufficient conditions for power-seeking tending to be optimal in fully-observable environments; many environments meet these sufficient conditions; optimal policies aren't hard to compute for the subtasks. One highly transferable heuristic would therefore be to gain power in new environments, and then figure out what to do for the specific goal at hand. This may or may not take the form of an explicit mesa-objective embedded in e.g. the policy network.

Later, the heuristic has the agent seek power for the "real world" environment.

comment by TurnTrout · 2020-07-23T22:45:56.637Z · LW(p) · GW(p)

In order to reduce bias (halo effect, racism, etc), shouldn't many judicial proceedings generally be held over telephone, and/or through digital audio-only calls with voice anonymizers? 

Replies from: mark-xu
comment by Mark Xu (mark-xu) · 2020-07-25T04:11:22.213Z · LW(p) · GW(p)

I don't see strong reasons why this isn't a good idea. I have heard that technical interviews sometimes get conducted with voice anonymizers.

comment by TurnTrout · 2020-07-22T16:22:01.647Z · LW(p) · GW(p)

I prompted GPT-3 with modified versions of Eliezer's Beisutsukai stories, where I modified the "class project" to be about solving intent alignment instead of quantum gravity. 

... Taji looked over his sheets. "Okay, I think we've got to assume that every avenue that Eld science was trying is a blind alley, or they would have found it. And if this is possible to do in one month, the answer must be, in some sense, elegant. So no human mistake models. If we start doing anything that looks like we should call it 'utility function patching', we'd better stop."

He turned to Brennan. "Do you have any ideas?"

"I don't think so," she said. She glanced at her paper, frowning. "There are two problems here."

One problem was simple enough: she didn't understand what it meant for an AI to "want" to help a human. This was hard because as far as she could tell, there hadn't been much research done on this question. It seemed to be something that only humans did, but there wasn't any evidence that this was true. What was it about wanting? Was it a biological drive or a social instinct? A combination of both? More importantly, since it was a human concept, how could she measure if an AI behaved the way humans would in certain situations when the AI was obviously not human?

Then there was the question of practicality. The most elegant solution was not always the best. This question—could an AI desire to help its overseer?—was highly suited to explore this problem, but it still wouldn't be a real test. Humans operated on a "helpfulness scale" which started from things like cashiers wishing you "have a nice day" and descending to Nazi concentration camp guards.

She wanted to craft an AI that would at least be a good clerk in today's society. That was hardly an ambitious goal for a team of four highly intelligent teenagers. And yet, and yet...

"We can look at this differently," Hyori said. "People are born with a blank mind. We can shape them however we want. AIs on the other hand, are born with 800 terabytes of behavioral examples from the moment they're activated. The only data we have about unprogrammed AIs is that they either stay still or randomly move around. All this ... it's not making any confident in how easy this will be." [?]

Brennan stopped writing and turned to look at her, frowning. "So what are you saying?"

"I don't want to approach this problem by trying to divert the AI from its goal," she said. "What if, instead of changing the mind of an AI, we instead changed the environment that an AI found itself in?"

The team fell silent. 

Styrlyn broke the silence. "Uh..."

"What I mean is," she said, "what if, instead of trying to divert the AI from one task, we created a situation where accomplishing two tasks would be more beneficial than accomplishing just one? We don't need to patch new programs into the mind of an AI to make it want to help us. We can literally make helping us the most logical decision for it."

Full transcript.

comment by TurnTrout · 2020-07-10T21:20:58.213Z · LW(p) · GW(p)

Transparency Q: how hard would it be to ensure a neural network doesn't learn any explicit NANDs?

comment by TurnTrout · 2020-06-27T20:29:07.335Z · LW(p) · GW(p)

Physics has existed for hundreds of years. Why can you reach the frontier of knowledge with just a few years of study? Think of all the thousands of insights and ideas and breakthroughs that have been had - yet, I do not imagine you need most of those to grasp modern consensus.

Idea 1: the tech tree is rather horizontal - for any given question, several approaches and frames are tried. Some are inevitably more attractive or useful. You can view a Markov decision process in several ways - through the Bellman equations, through the structure of the state visitation distribution functions, through the environment's topology, through Markov chains induced by different policies. Almost everyone thinks about them in terms of Bellman equations, there were thousands of papers on that frame pre-2010, and you don't need to know most of them to understand how deep Q-learning works.

Idea 2: some "insights" are wrong (phlogiston) or approximate (Newtonian mechanics) and so are later discarded. The insights become historical curiosities and/or pedagogical tools and/or numerical approximations of a deeper phenomenon. 

Idea 3: most work is on narrow questions which end up being dead-ends or not generalizing. As a dumb example, I could construct increasingly precise torsion balance pendulums, in order to measure the mass of my copy of Dune to increasing accuracies. I would be learning new facts about the world using a rigorous and accepted methodology. But no one would care. 

More realistically, perhaps only a few other algorithms researchers care about my refinement of a specialized sorting algorithm (from  to ), but the contribution is still quite publishable and legible. 

I'm not sure what publishing incentives were like before the second half of the 20th century, so perhaps this kind of research was less incentivized in the past.

Replies from: Viliam
comment by Viliam · 2020-06-27T22:47:34.606Z · LW(p) · GW(p)

Could this depend on your definition of "physics"? Like, if you use a narrow definition like "general relativity + quantum mechanics", you can learn that in a few years. But if you include things like electricity, expansion of universe, fluid mechanics, particle physics, superconductors, optics, string theory, acoustics, aerodynamics... most of them may be relatively simple to learn, but all of them together it's too much.

Replies from: TurnTrout
comment by TurnTrout · 2020-06-28T04:07:40.864Z · LW(p) · GW(p)

Maybe. I don't feel like that's the key thing I'm trying to point at here, though. The fact that you can understand any one of those in a reasonable amount of time is still surprising, if you step back far enough.

comment by TurnTrout · 2020-06-24T01:16:46.096Z · LW(p) · GW(p)

When under moral uncertainty, rational EV maximization will look a lot like preserving attainable utility / choiceworthiness for your different moral theories / utility functions, while you resolve that uncertainty.

Replies from: MichaelA
comment by MichaelA · 2020-06-24T03:05:06.050Z · LW(p) · GW(p)

This seems right to me, and I think it's essentially the rationale for the idea of the Long Reflection [EA · GW].

comment by TurnTrout · 2020-03-27T13:30:45.421Z · LW(p) · GW(p)

To prolong my medicine stores by 200%, I've mixed in similar-looking iron supplement placebos with my real medication. (To be clear, nothing serious happens to me if I miss days)