Is it safe to spend time with people who already recovered from COVID? 2020-12-02T22:06:13.469Z
Non-Obstruction: A Simple Concept Motivating Corrigibility 2020-11-21T19:35:40.445Z
Math That Clicks: Look for Two-Way Correspondences 2020-10-02T01:22:18.177Z
Power as Easily Exploitable Opportunities 2020-08-01T02:14:27.474Z
Generalizing the Power-Seeking Theorems 2020-07-27T00:28:25.677Z
GPT-3 Gems 2020-07-23T00:46:36.815Z
To what extent is GPT-3 capable of reasoning? 2020-07-20T17:10:50.265Z
What counts as defection? 2020-07-12T22:03:39.261Z
Corrigibility as outside view 2020-05-08T21:56:17.548Z
How should potential AI alignment researchers gauge whether the field is right for them? 2020-05-06T12:24:31.022Z
Insights from Euclid's 'Elements' 2020-05-04T15:45:30.711Z
Problem relaxation as a tactic 2020-04-22T23:44:42.398Z
A Kernel of Truth: Insights from 'A Friendly Approach to Functional Analysis' 2020-04-04T03:38:56.537Z
Research on repurposing filter products for masks? 2020-04-03T16:32:21.436Z
ODE to Joy: Insights from 'A First Course in Ordinary Differential Equations' 2020-03-25T20:03:39.590Z
Conclusion to 'Reframing Impact' 2020-02-28T16:05:40.656Z
Reasons for Excitement about Impact of Impact Measure Research 2020-02-27T21:42:18.903Z
Attainable Utility Preservation: Scaling to Superhuman 2020-02-27T00:52:49.970Z
How Low Should Fruit Hang Before We Pick It? 2020-02-25T02:08:52.630Z
Continuous Improvement: Insights from 'Topology' 2020-02-22T21:58:01.584Z
Attainable Utility Preservation: Empirical Results 2020-02-22T00:38:38.282Z
Attainable Utility Preservation: Concepts 2020-02-17T05:20:09.567Z
The Catastrophic Convergence Conjecture 2020-02-14T21:16:59.281Z
Attainable Utility Landscape: How The World Is Changed 2020-02-10T00:58:01.453Z
Does there exist an AGI-level parameter setting for modern DRL architectures? 2020-02-09T05:09:55.012Z
AI Alignment Corvallis Weekly Info 2020-01-26T21:24:22.370Z
On Being Robust 2020-01-10T03:51:28.185Z
Judgment Day: Insights from 'Judgment in Managerial Decision Making' 2019-12-29T18:03:28.352Z
Can fear of the dark bias us more generally? 2019-12-22T22:09:42.239Z
Clarifying Power-Seeking and Instrumental Convergence 2019-12-20T19:59:32.793Z
Seeking Power is Often Provably Instrumentally Convergent in MDPs 2019-12-05T02:33:34.321Z
How I do research 2019-11-19T20:31:16.832Z
Thoughts on "Human-Compatible" 2019-10-10T05:24:31.689Z
The Gears of Impact 2019-10-07T14:44:51.212Z
World State is the Wrong Abstraction for Impact 2019-10-01T21:03:40.153Z
Attainable Utility Theory: Why Things Matter 2019-09-27T16:48:22.015Z
Deducing Impact 2019-09-24T21:14:43.177Z
Value Impact 2019-09-23T00:47:12.991Z
Reframing Impact 2019-09-20T19:03:27.898Z
What You See Isn't Always What You Want 2019-09-13T04:17:38.312Z
How often are new ideas discovered in old papers? 2019-07-26T01:00:34.684Z
TurnTrout's shortform feed 2019-06-30T18:56:49.775Z
Best reasons for pessimism about impact of impact measures? 2019-04-10T17:22:12.832Z
Designing agent incentives to avoid side effects 2019-03-11T20:55:10.448Z
And My Axiom! Insights from 'Computability and Logic' 2019-01-16T19:48:47.388Z
Penalizing Impact via Attainable Utility Preservation 2018-12-28T21:46:00.843Z
Why should I care about rationality? 2018-12-08T03:49:29.451Z
A New Mandate 2018-12-06T05:24:38.351Z
Towards a New Impact Measure 2018-09-18T17:21:34.114Z
Impact Measure Desiderata 2018-09-02T22:21:19.395Z


Comment by turntrout on Covid 12/3: Land of Confusion · 2020-12-04T04:00:36.928Z · LW · GW

Twitter is almost always blocked on my devices, as well.

Comment by turntrout on Developmental Stages of GPTs · 2020-12-04T02:54:24.160Z · LW · GW

What is the formal definition of 'power seeking'?

The freshly updated paper answers this question in great detail; see section 6 and also appendix B.

Comment by turntrout on SETI Predictions · 2020-12-01T19:39:34.044Z · LW · GW

Why do people have such low credences for "The effect of First contact is mostly harmful (e.g., selfish ETI, hazards)"? Most alien minds probably don't care about us? But perhaps caring about variety is evolutionarily convergent? If not, why wouldn't our "first contact" be extremely negative (given their tech advantage)?

Comment by turntrout on TurnTrout's shortform feed · 2020-11-30T03:53:04.370Z · LW · GW 

Comment by turntrout on TurnTrout's shortform feed · 2020-11-25T17:38:31.692Z · LW · GW

Over the last 2.5 years, I've read a lot of math textbooks. Not using Anki / spaced repetition systems over that time has been an enormous mistake. My factual recall seems worse-than-average among my peers, but when supplemented with Anki, it's far better than average (hence, I was able to learn 2000+ Japanese characters in 90 days, in college). 

I considered using Anki for math in early 2018, but I dismissed it quickly because I hadn't had good experience using that application for things which weren't languages. I should have at least tried to see if I could repurpose my previous success! I'm now happily using Anki to learn measure theory and ring theory, and I can already tell that it's sticking far better. 

This mistake has had real consequences. I've gotten far better at proofs and I'm quite good at real analysis (I passed a self-administered graduate qualifying exam in the spring), but I have to look things some up for probability theory. Not a good look in interviews. I might have to spend weeks of extra time reviewing things I could have already stashed away in an Anki deck. 


Comment by turntrout on Just another day in utopia · 2020-11-24T04:54:11.473Z · LW · GW

What a beautiful, bold, and chaotic story. Thanks for writing, Stuart.

Comment by turntrout on Embedded Interactive Predictions on LessWrong · 2020-11-23T20:24:10.343Z · LW · GW

I also did give arguments above, but people mostly made jokes about my punctuation! #grumpy

This is a timeless part of the LessWrong experience, my friend. 

Comment by turntrout on Embedded Interactive Predictions on LessWrong · 2020-11-23T15:31:47.258Z · LW · GW

Argument screens off authority, and I'm interested in hearing arguments. While that information should be incorporated into your prior, I don't see why it's worth mentioning as a counterargument. (To be sure, I'm not claiming that jacobjacob isn't a good predictor in general.)

Comment by turntrout on TurnTrout's shortform feed · 2020-11-23T03:10:19.610Z · LW · GW

I remarked to my brother, Josh, that when most people find themselves hopefully saying "here's how X can still happen!", it's a lost cause and they should stop grasping for straws and move on with their lives. Josh grinned, pulled out his cryonics necklace, and said "here's how I can still not die!"

Comment by turntrout on Non-Obstruction: A Simple Concept Motivating Corrigibility · 2020-11-22T21:25:13.491Z · LW · GW

Do I intend to do something with people's predictions? Not presently, but I think people giving predictions is good both for the reader (to ingrain the concepts by thinking things through enough to provide a credence / agreement score) and for the community (to see where people stand wrt these ideas).

Comment by turntrout on The Catastrophic Convergence Conjecture · 2020-11-22T18:16:01.148Z · LW · GW

The catastrophic convergence conjecture was originally formulated in terms of "outer alignment catastrophes tending to come from power-seeking behavior." I think that this was a mistake: I meant to talk about impact alignment catastrophes tending to be caused by power-seeking. I've updated the post accordingly.

Comment by turntrout on Embedded Interactive Predictions on LessWrong · 2020-11-22T04:37:23.552Z · LW · GW

jacobjacob once again seems too pessimistic, posterior is very heavily that when habryka makes a 60% yes prediction for a decision he has (partial) control over about a functionality which the community has glommed onto thus far, the community is also justified in expressing ~60% belief that the feature ships. :)

Also, we aren't selecting from "most possible features"!

Comment by turntrout on Embedded Interactive Predictions on LessWrong · 2020-11-21T20:37:25.007Z · LW · GW

I actually waited for this feature to post!

Comment by turntrout on AGI Predictions · 2020-11-21T20:33:34.410Z · LW · GW

For me, it's because there's disjunctively many ways that AGI could not happen (global totalitarian regime, AI winter, 55% CFR avian flu escapes a BSL4 lab, unexpected difficulty building AGI & the planning fallacy on timelines which we totally won't fall victim to this time...), or that alignment could be solved, or that I could be mistaken about AGI risk being a big deal, or... 

Granted, I assign small probabilities to several of these events. But my credence for P(AGI extinction | no more AI alignment work from community) is 70% - much higher than my 40% unconditional credence. I guess that means yes, I think AGI risk is huge (remember that I'm saying "40% chance we just die to AGI, unconditionally"), and that's after incorporating the significant contributions which I expect the current community to make. The current community is far from sufficient, but it's also probably picking a good amount of low-hanging fruit, and so I expect that its presence makes a significant difference.

EDIT: I'm decreasing the 70% to 60% to better match my 40% unconditional, because only the current alignment community stops working on alignment. 

Comment by turntrout on AGI Predictions · 2020-11-21T20:20:49.645Z · LW · GW

”Catastrophic” is normally used in the term ”global catastrophic risk” and means something like “kills 100,000s of people”, so I do think “doesn’t necessarily kill but could’ve killed a couple of people” is a fairly different meaning.

Agreed. In retrospect, I might have opted for "pre-AGI nearly-deadly accident caused by deceptive alignment." 

In retrospect I realize that I put my answer to the second question far too high — if it just means “a deceptive aligned system nearly gives a few people in hospital a fatal dosage but it’s stopped and we don’t know why the system messed up” then it’s quite plausible nothing this substantial will happen as a result of that.

I intended the situation to be more like "we catch the AI pretending to be aligned, but actually lying, and it almost or does kill at least a few people as a result of that." 

With #1, I'm trying to have people predict the "deception is robustly instrumental behavior, but AIs will be bad at it at first and so we'll catch them." #2 is trying to operationalize whether this would be viewed as a fire alarm.

Some ways you might think scenario #1 won't happen:

  • You don't think deception will be incentivized
  • Fast takeoff means the AI is never smart enough to deceive but dumb enough to get caught
  • Our transparency tools won't be good enough for many people to believe it was actually deceptively aligned
Comment by turntrout on Embedded Interactive Predictions on LessWrong · 2020-11-21T19:37:18.403Z · LW · GW

This question resolves yes.

Comment by turntrout on AGI Predictions · 2020-11-21T06:10:11.761Z · LW · GW

In the following, an event is "catastrophic" if it endangers several human lives; it need not be an existential catastrophe.

Edit: I meant to say "deceptive alignment", but the meaning should be clear either way.

Comment by turntrout on Embedded Interactive Predictions on LessWrong · 2020-11-21T03:08:37.974Z · LW · GW

Smiles knowingly from behind steepled hands

Comment by turntrout on Embedded Interactive Predictions on LessWrong · 2020-11-20T22:01:50.634Z · LW · GW

You've just handed me so much power to prove people right or wrong on the internet... 

Comment by turntrout on Embedded Interactive Predictions on LessWrong · 2020-11-20T20:43:42.361Z · LW · GW

This looks so good! Great work by both Ought and the LW mods.

Perhaps this can also gauge approval (from 0% (I hate it) to 100% (I love it)), or even function as a poll (first decile: research agenda 1 seems most promising; second decile: research area 2...)... or Ought we not do things like that?

Comment by turntrout on Learning Normativity: A Research Agenda · 2020-11-13T04:22:07.076Z · LW · GW

Another thing which seems to "gain something" every time it hops up a level of meta: Corrigibility as Outside View. Not sure what the fixed points are like, if there are any, and I don't view what I wrote as attempting to meet these desiderata. But I think there's something interesting that's gained each time you go meta. 

Comment by turntrout on TurnTrout's shortform feed · 2020-11-06T18:19:25.977Z · LW · GW

Sure, but why would those changes tend to favor Trump as you get outside of a small neighborhood? Like, why would Biden / (Biden or Trump win) < .5? I agree it would at least approach .5 as the neighborhood grows. I think. 

Comment by turntrout on TurnTrout's shortform feed · 2020-11-06T17:05:49.323Z · LW · GW

Why do you think that? How do you know that?

Comment by turntrout on TurnTrout's shortform feed · 2020-11-05T21:50:18.519Z · LW · GW

I read someone saying that ~half of the universes in a neighborhood of ours went to Trump. But... this doesn't seem right. Assuming Biden wins in the world we live in, consider the possible perturbations to the mental states of each voter. (Big assumption! We aren't thinking about all possible modifications to the world state. Whatever that means.)

Assume all 2020 voters would be equally affected by a perturbation (which you can just think of as a decision-flip for simplicity, perhaps). Since we're talking about a neighborhood ("worlds pretty close to ours"), each world-modification is limited to N decision flips (where N isn't too big).

  • There are combinatorially more ways for a race to be close (in popular vote) than for it to not be close. But we're talking perturbations, and so since we're assuming Biden wins in this timeline, he's still winning in most other timelines close to ours
    • I don't know whether the electoral college really changes this logic. If we only consider a single state (PA), then it probably doesn't?
  • I'm also going to imagine that most decision-flips didn't have too many downstream effects, but this depends on when the intervention takes place: if it's a week beforehand, maybe people announce changes-of-heart to their families? A lot to think about there. I'll just pretend like they're isolated because I don't feel like thinking about it that long, and it's insanely hard to play out all those effects.
  • Since these decision-flips are independent, you don't get any logical correlations: the fact that I randomly changed my vote, doesn't change how I expect people like me to vote. This is big.

Under my extremely simplified model, the last bullet is what makes me feel like most universes in our neighborhood were probably also Biden victories.

Comment by turntrout on Why indoor lighting is hard to get right and how to fix it · 2020-10-29T18:52:33.991Z · LW · GW

Forgive me if I missed it in the post, but might it be cheaper to use some kind of visor if you're only illuminating for personal benefit? Much easier and cheaper to fill your FOV with light that way.

Comment by turntrout on Babble challenge: 50 ways to escape a locked room · 2020-10-26T03:29:38.675Z · LW · GW

I did this with my family! We had a lot of fun.

(Misread prompt as "you have 10 years of food & water")

1. Kick down the door
2. Unlock the door
3. Use high-pressure water source to explode the room

4. Contact someone and get them to open the door

5. Learn how to pick locks and pick the locks

6. Eat through the door

7. Bust through the thin drywall

8. End up in a low-probability universe where you teleport outside the room

9. Wait for someone to come open the door

10. Upload yourself and have yourself reconstructed on the outside

11. Build an AGI to solve the problem

12. Hire someone to come blow up the door

13. Go on an extreme diet and slide under the door

14. Train hard at karate so you can break off the hinges with one blow

15. Use a rocket launcher

16. Climb out through ventilation shaft

17. Synthesize corrosive compound that eats through the wall

18. Tunnel underground with your hands

19. Take screw from glasses, burrow out hole in door so you can reach hand through and unlock it

20. Take your shoe and hit the bolts out

21. Use hemp in food to grow a rope and then slip under the door and wait for someone to trip and then let them know you're there

22. Scream at resonant frequency of the wall so it explodes

23. Trigger sprinkler system to soften drywall

24. Burn down whole room and hope your remains are enterred outside of the room

25. Wear your way through the wall

26. Eat so much food that you get bigger than the room and then it explodes

27. Download blueprints for the room and then exploit weak points

28. Use avocado pits to cut out cinderblock

29. Find out how to make explosives from food (methane gas?)

30. Use acidity of something to wear away metal in room

31. Train super hard to run fast enough opposite to Earth's spin to stop its spin and blow the room away via inertia (this wouldn't work)

32. Build a 3D printer that takes food as input and produces keys to the door

33. Jump hard enough to bust head through ceiling

34. Use light + unlimited energy to wood burn out of the room

35. Wave your hands so fast the air turns to plasma

36. Call the police and tell them you have weed in the room

37. Let food rot; bug eggs hatch; bugs chew through the door

38. Hire a hitman to kidnap you

39. Climb out through the window

40. Rip off two of your toes and rub them together to make a fire

41. Wait for tornado

42. Wait for earthquake

43. Wait until sun expands and then "your" molecules likely end up outside of the former room's molecules

44. Spin arms at right angle to get lift to turn yourself into a human helicopter and fly out of the skylight

45. Use tine on belt to pick lock

46. Get the building condemned (via your phone) and torn down

47. Sand away log walls with sandblock

48. Drill through the wall by spinning your hand quickly

49. Scratch out with really long fingernails

50. Yell for help

Comment by turntrout on TurnTrout's shortform feed · 2020-10-26T03:20:57.104Z · LW · GW

If you're tempted to write "clearly" in a mathematical proof, the word quite likely glosses over a key detail you're confused about. Use that temptation as a clue for where to dig in deeper.

At least, that's how it is for me.

Comment by turntrout on TurnTrout's shortform feed · 2020-10-21T02:37:18.929Z · LW · GW
From unpublished work.

The answer to this seems obvious in isolation: shaping helps with credit assignment, rescaling doesn't (and might complicate certain methods in the advantage vs Q-value way). But I feel like maybe there's an important interaction here that could inform a mathematical theory of how a reward signal guides learners through model space?

Comment by turntrout on Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning · 2020-10-20T15:51:40.326Z · LW · GW

I really like this post. Thanks for writing it!

(Why didn't you mention the Slaughterbot Slip-up of '24, though?)

Comment by turntrout on Knowledge, manipulation, and free will · 2020-10-19T14:34:12.175Z · LW · GW

Ideally within the next month!

Comment by turntrout on Knowledge, manipulation, and free will · 2020-10-18T15:44:21.812Z · LW · GW

(I have a big google doc analyzing corrigibility & manipulation from the attainable utility landscape frame; I’ll link it here when the post goes up on LW)

Comment by turntrout on TurnTrout's shortform feed · 2020-10-16T18:24:48.881Z · LW · GW

Epistemic status: not an expert

Understanding Newton's third law, .

Consider the vector-valued velocity as a function of time, . Scale this by the object's mass and you get the momentum function over time. Imagine this momentum function wiggling around over time, the vector from the origin rotating and growing and shrinking.

The third law says that force is the derivative of this rescaled vector function - if an object is more massive, then the same displacement of this rescaled arrow is a proportionally smaller velocity modification, because of the rescaling!

And also, forces have opposite reactions (by conservation of momentum) and equal reactions (by conservation of energy).

Comment by turntrout on Babble challenge: 50 ways of hiding Einstein's pen for fifty years · 2020-10-15T15:32:57.629Z · LW · GW

I'm going to assume the evil forces knew I had it, but can't read my mind or know everything I did to hide it. They'll spend a reasonable amount of resources looking, but not an infinite amount, and they won't torture me to find out where it is (they know how unreliable torture is as information gain!).

  1. Put the pen in a small steel box, go on a multi-day hike, and randomly bury the box somewhere along the hike. Remember where you put it.
  2. Travel to a new country and keep it in a bank deposit box under a fake name. (Did they have those in 1855?)
  3. Acquire a bunch of pens that look exactly the same, put a different color marking on the interior of each, and give each to a friend. (Figure out some way to make sure the Evil People don't just steal all of the pens?)
  4. Put the pen inside the wall of a friend's house.
  5. Destroy the pen and then remake it in 50 years.
  6. Embed it in a tree core. (Would the tree grow to crush it?)
  7. Commission a copy of the pen and give it to them, claiming it's the real thing.
  8. Have 10 friends over and give them each a pen, only one of which is the Destined Pen. Then, each of them meets 10 people and gives one of them the pen, and so on once again. The final person who receives the actual pen is given the appropriate instructions and payment plan for delivering the pen to Einstein in 50 years (DDOS their resources - they probably can't investigate 1000 people).
  9. Find a box and put the pen in it.
  10. Disassemble the pen and entrust the pieces to a pensmith (is that a thing?)
  11. Preserve the pen in some kind of resin
  12. Form a concrete block around the pen, or otherwise embed it in some kind of wall, and then take it out when needed
  13. Go spelunking and leave the pen wedged in one of the caves in a memorable but rarely frequented location (I notice I'm focusing on the "hide from dedicated group of people" constraint, which is slowing me down. I'm going to focus on just "places you could put a pen" without worrying at all about hiding for now.)
  14. On the ground.
  15. In the attic.
  16. Bird's nest
  17. Buried in dirt
  18. Library
  19. Carriage
  20. Warehouse (Ok, back to solutions that have at least a small chance of working)
  21. Pay someone to hide the pen, split up the location info into a cryptographically secure 3-out-of-5 secret-sharing scheme, and then dole out the info to me and four close friends who live far away and move frequently. In 1855, it would be logistically hard to coordinate to track 3 of us down at the same time. They pass on the secret to their children and spouses, in case any of us die before 50 years pass.
  22. Unethical: kill the evil forces?
  23. Unethical: take over the local government and have them protect the pen.
  24. Talk the bad guys out of wanting the pen.
  25. Set up an elaborate dungeon below my house (Legend of Zelda-style), at the end of which is a fake copy of the pen. Keep the real pen in my desk drawer.
  26. Similarly, go to a lot of trouble to look like I'm building up a vault for the pen, all the while keeping "the rest" of my pens on a pedestal in the middle of my house. Of course, the real pen is there.
  27. Persuade the bad guys that pens are really annoying and you should use a mechanical pencil (that totally existed in 1855, right?), if you aren't going to use a digital writing device (which they aren't going to use).
  28. Give a fake pen to a major world power, and then tell the bad guys it's already out of my hands, and eat popcorn while the two groups fight.
  29. Find Einstein's ancestors and convince them to pass down the pen.
  30. Using extreme computational resources and technological prowess, build a capsule that's shot into LEO which is timed to reenter Earth's atmosphere in front of Einstein right when he's looking for a new pen.
  31. Deduce exactly what kind of simulation we're on, then execute a bizarre policy that mind-hacks whoever's observing us into spawning 1x Pen of Destiny for Einstein at exactly the right time.
  32. Compute a Butterfly Effect policy for a molecularly identical pen spontaneously assembling itself in front of Einstein at the appropriate time (this is totally a thing, right?)
  33. Wedge the sanitized pen between my radius and ulna. That totally wouldn't go wrong, and it'd be too gross for them to want to retrieve.
  34. Buy a farm and hide it in a hay bale.
  35. Go to a stream and instruct a friend to wait some unknown # of miles downstream. Put the pen in a bottle, float it downstream, and the friend retrieves it and waits downstream. Maybe a festival could be going on near the stream at the same time. In any case, even if I'm being trailed at that moment, they wouldn't know how far downstream it went, which would give my friend an important head start for hiding it in a different town.
  36. Pay someone to take it to a tribe which will be convinced that it should be worshipped as a sacred object, and then steal it back in 50 years.
  37. Memorize a binary sequence using a memory palace, which I use as an XOR cipher on a series of coin flips which indicate: "heads: go north 100 feet; tails: go east 100 feet". Flip 100 coins and write down the result, and then bury the coin in the place indicated by the flips XOR the sequence. (This is basically a one-time pad for north-eastern lattice paths)
  38. Do the above, but just through the pseudo-randomly generated memorized sequence. Also, have a habit of taking these north-eastern lattice path walks randomly for a few days before and after actually burying the pen, so they don't know the walk on which I buried the pen.
  39. Enter into a 2-out-of-2 secret sharing scheme with someone difficult to intimidate, like a major world leader.
  40. Do 38, but using a fall tree branch as my (not technically) pseudorandom source: given a branch, choose the one that has the reddest leaf on it, using an appropriate embedding from branch choices into directional bearings for the next step.
  41. Put the pen inside a football that won't be used. Who would do that?
  42. Preserve the pen in permafrost somewhere stable (is that a thing? would it be crushed by changing pressures?)
  43. Encase the pen in diamond or some even harder material, which Einstein will need to invent molecular nanotech to undo. He wants his miracle papers, let him do the impossible!
  44. If I'm making myself as capable as in #43, why not just build an AGI to ensure Einstein gets it? Might as well.
  45. Toss the pen into a furnace in front of the bad guys; unbeknownst to them, the furnace has a secret compartment in which the pen will be safe from the heat.
  46. Keep the pen inside a wooden compartment, which I nail to the underside of the nearest bridge (so that if the pen falls, it falls over land).
  47. Fake my own death and inter the pen with the fake body.
  48. 47, but hide the pen in the gravestone.
  49. 48, but a friend surreptitiously removes the pen from the gravestone compartment during the funeral.
  50. Just give the bad guys the pen and take it back before Einstein needs it.
Comment by turntrout on Knowledge, manipulation, and free will · 2020-10-14T21:02:06.525Z · LW · GW

OK, but there's a difference between "here's a definition of manipulation that's so waterproof you couldn't break it if you optimized against it with arbitrarily large optimization power" and "here's my current best way of thinking about manipulation." I was presenting the latter, because it helps me be less confused than if I just stuck to my previous gut-level, intuitive understanding of manipulation.

Edit: Put otherwise, I was replying more to your point (1) than your point (2) in the original comment. Sorry for the ambiguity!

Comment by turntrout on How much to worry about the US election unrest? · 2020-10-14T15:02:17.947Z · LW · GW

In your original comment, you wrote:

one possible implication being he will also not accept the results

The first article in that search says:

Before leaving the state, Biden told reporters his comments were ‘taken a little out of context’ and added that ‘I’m going to accept the outcome of this election, period.’

Comment by turntrout on Knowledge, manipulation, and free will · 2020-10-14T01:59:35.282Z · LW · GW

Not Stuart, but I agree there's overlap here. Personally, I think about manipulation as when an agent's policy robustly steers the human into taking a certain kind of action, in a way that's robust to the human's counterfactual preferences. Like if I'm choosing which pair of shoes to buy, and I ask the AI for help, and no matter what preferences I had for shoes to begin with, I end up buying blue shoes, then I'm probably being manipulated. A non-manipulative AI would act in a way that increases my knowledge and lets me condition my actions on my preferences.

Comment by turntrout on How much to worry about the US election unrest? · 2020-10-12T19:35:35.665Z · LW · GW


Comment by turntrout on How much to worry about the US election unrest? · 2020-10-12T15:31:46.261Z · LW · GW

I'm not very satisfied by answers like "X won't support him, because that's illegal" and "it's unconstitutional for the federal government to do X, so they won't." I think these usually are correct, but over the last four years we have seen rapid deterioration of our ability to: agree on an objective reality, have an executive branch which abides by the law absent immediate and tangible enforcement mechanisms (remember when congressional subpoenas were at least often answered? now they're ~always ignored AFAICT), have common knowledge that the law is the Law and if you break it you will be punished (obviously, rich&powerful would get more leeway in this calculation), etc.

I think many of these things have degraded and am no longer sure that anything would really stop red states from ignoring the popular results and sending their own set of electors. It's already being discussed, and red officials have admitted they are discussing it without immediately walking it back / distancing themselves from the prospect. Maybe the Supreme Court would be enough to stop that, if they so chose. (Would they? Aren't states technically allowed to choose electors however they please?)

Comment by turntrout on TurnTrout's shortform feed · 2020-10-09T16:17:36.093Z · LW · GW

I went to the doctor's yesterday. This was embarrassing for them on several fronts.

First, I had to come in to do an appointment which could be done over telemedicine, but apparently there are regulations against this.

Second, while they did temp checks and required masks (yay!), none of the nurses or doctors actually wore anything stronger than a surgical mask. I'm coming in here with a KN95 + goggles + face shield because why not take cheap precautions to reduce the risk, and my own doctor is just wearing a surgical? I bought 20 KN95s for, like, 15 bucks on Amazon.

Third, and worst of all, my own doctor spouted absolute nonsense. The mildest insinuation was that surgical facemasks only prevent transmission, but I seem to recall that many kinds of surgical masks halve your chances of infection as well.

Then, as I understood it, he first claimed that coronavirus and the flu have comparable case fatality rates. I wasn't sure if I'd heard him correctly - this was an expert talking about his area of expertise, so I felt like I had surely misunderstood him. I was taken aback. But, looking back, that's what he meant.

He went on to suggest that we can't expect COVID immunity to last (wrong) but also that we need to hit 70% herd immunity (wrong). How could you even believe both of these things at the same time? Under those beliefs, are we all just going to get sick forever? Maybe he didn't notice the contradiction because he made the claims a few minutes apart.

Next, he implied that it's not a huge deal that people have died because a lot of them had comorbidities. Except that's not how comorbidities and counterfactual impact works. "No one's making it out of here alive", he says. An amusing rationalization.

He also claimed that nursing homes have an average stay length of 5 months. Wrong. AARP says it's 1.5 years for men, 2.5 years for women, but I've seen other estimate elsewhere, all much higher than 5 months. Not sure what the point of this was - old people are 10 minutes from dying anyways? What?

Now, perhaps I misunderstood or misheard one or two points. But I'm pretty sure I didn't mishear all of them. Isn't it great that I can correct my doctor's epidemiological claims after reading Zvi's posts and half of an epidemiology textbook? I'm glad I can trust my doctor and his epistemology.

Comment by turntrout on "Zero Sum" is a misnomer. · 2020-10-02T20:05:53.511Z · LW · GW

I read "feasible" as something like "rationalizable." I think it would have been much clearer if you had said "if no strategy profiles are Pareto over any others."

Comment by turntrout on Math That Clicks: Look for Two-Way Correspondences · 2020-10-02T03:58:35.635Z · LW · GW

Wikipedia says if , it's a "non-expansive map". But yes, contraction maps have some Lipschitz constant that enforces the behavior you describe. However, notice we still have the math "intuitive contraction" here, so it has the reverse-direction correspondence. Intriguingly, we're missing part of "intuitive contraction = bring things closer together" for the case, as you point out, so we don't have the forward direction fulfilled.

Comment by turntrout on "Zero Sum" is a misnomer. · 2020-10-01T18:52:18.459Z · LW · GW

You're thinking of a Kaldor-Hicks optimality frontier for {outcomes with maximal total payoff}, while the Pareto frontier is {maximal elements in the unanimous-agreement preference ordering over outcomes}.

Comment by turntrout on "Zero Sum" is a misnomer. · 2020-10-01T18:44:54.961Z · LW · GW

Right, I understand how this correctly labels certain cases, but that doesn't seem to address my question?

Comment by turntrout on "Zero Sum" is a misnomer. · 2020-09-30T19:44:28.028Z · LW · GW

So, we could consider a game completely adversarial if it has a structure like this: no strategy profiles are a Pareto improvement over any others. In other words, the feasible outcomes of the game equal the game's Pareto frontier. All possible outcomes involve trade-offs between players.

I must have missed some key word - by this definition, wouldn't common-payoff games be "completely adversarial", because the "feasible" outcomes equal the Pareto frontier under the usual assumptions?

Comment by turntrout on Puzzle Games · 2020-09-28T23:07:27.835Z · LW · GW

(Vague spoilers about non-obvious puzzles in Fez)

I don't remember there being an abrupt change; it felt like there were secrets under the surface. I didn't have to read around to figure out that the game brims with coded messages, it felt like a natural part of exploring the world and understanding what happened to it. But perhaps it's in the eye of the beholder.

Comment by turntrout on Puzzle Games · 2020-09-28T02:49:31.275Z · LW · GW

I think you missed out on most of the game. (Spoilers as to the nature of the game's deeper puzzles)

Comment by turntrout on Puzzle Games · 2020-09-27T23:01:59.795Z · LW · GW

I remember enjoying Fez when I was in college. Past-me would probably put it at Tier 2.

Comment by turntrout on TurnTrout's shortform feed · 2020-09-27T22:52:55.668Z · LW · GW

What is "real"? I think about myself as a computation embedded in some other computation (i.e. a universe-history). I think "real" describes hypotheses about the environment where my computation lives. What should I think is real? That which an "ideal embedded reasoner" would assign high credence. However that works.

This sensibly suggests that Gimli-in-actual-Ea (LOTR) should believe he lives in Ea, and that Ea is real, even though it isn't our universe's Earth. Also, the notion accounts for indexical uncertainty by punting it to how embedded reasoning should work (a la radical probabilism), without being tautological. Also, it supports both the subjective nature of what one should call "real", and the notion of an actual out-there-somewhere shared reality (multiple computations can be embedded within the same universe-history).

Comment by turntrout on TurnTrout's shortform feed · 2020-09-26T22:24:17.184Z · LW · GW

Reasoning about learned policies via formal theorems on the power-seeking incentives of optimal policies

One way instrumental subgoals might arise in actual learned policies: we train a proto-AGI reinforcement learning agent with a curriculum including a variety of small subtasks. The current theorems show sufficient conditions for power-seeking tending to be optimal in fully-observable environments; many environments meet these sufficient conditions; optimal policies aren't hard to compute for the subtasks. One highly transferable heuristic would therefore be to gain power in new environments, and then figure out what to do for the specific goal at hand. This may or may not take the form of an explicit mesa-objective embedded in e.g. the policy network.

Later, the heuristic has the agent seek power for the "real world" environment.

(Optimal Farsighted Agents Tend to Seek Power is rather dated and will be updated soon.)

Comment by turntrout on Book Review: Working With Contracts · 2020-09-18T03:36:57.101Z · LW · GW

Once formed, a contract acts as custom, private law between the parties. 

This is a cool way of understanding contracts. 

I'm putting this on the shelf of facepalm-obvious-but-beautiful realizations like

  • Medicine is the science of healing, not just a collection of random facts about what pills to take
  • Math is about the inescapable and immutable consequences of basic rules, not just about playing with integrals and numbers
  • Physics is, in large part, about discovering the transition rules of the universe
  • Machine learning is about the beautiful ideal learn- yeah, no, machine learning is still just a mess