Does SGD Produce Deceptive Alignment? 2020-11-06T23:48:09.667Z
What posts do you want written? 2020-10-19T03:00:26.341Z
The Solomonoff Prior is Malign 2020-10-14T01:33:58.440Z
What are objects that have made your life better? 2020-05-21T20:59:27.653Z
What are your greatest one-shot life improvements? 2020-05-16T16:53:40.608Z
Training Regime Day 25: Recursive Self-Improvement 2020-04-29T18:22:03.677Z
Training Regime Day 24: Resolve Cycles 2 2020-04-28T19:00:09.060Z
Training Regime Day 23: TAPs 2 2020-04-27T17:37:15.439Z
Training Regime Day 22: Murphyjitsu 2 2020-04-26T20:18:50.505Z
Training Regime Day 21: Executing Intentions 2020-04-25T22:16:04.761Z
Training Regime Day 20: OODA Loop 2020-04-24T18:11:30.506Z
Training Regime Day 19: Hamming Questions for Potted Plants 2020-04-23T16:00:10.354Z
Training Regime Day 18: Negative Visualization 2020-04-22T16:06:46.138Z
Training Regime Day 17: Deflinching and Lines of Retreat 2020-04-21T17:45:34.766Z
Training Regime Day 16: Hamming Questions 2020-04-20T14:51:31.310Z
Mark Xu's Shortform 2020-03-10T08:11:23.586Z
Training Regime Day 16: Hamming Questions 2020-03-01T18:46:32.335Z
Training Regime Day 15: CoZE 2020-02-29T17:13:42.685Z
Training Regime Day 14: Traffic Jams 2020-02-28T17:52:28.354Z
Training Regime Day 13: Resolve Cycles 2020-02-27T17:45:07.845Z
Training Regime Day 12: Focusing 2020-02-26T19:07:15.407Z
Training Regime Day 11: Socratic Ducking 2020-02-25T17:19:57.320Z
Training Regime Day 10: Systemization 2020-02-24T17:20:15.385Z
Training Regime Day 9: Double-Crux 2020-02-23T18:08:31.108Z
Training Regime Day 8: Noticing 2020-02-22T19:47:03.898Z
Training Regime Day 7: Goal Factoring 2020-02-21T17:55:29.848Z
Training Regime Day 6: Seeking Sense 2020-02-20T17:33:29.011Z
Training Regime Day 5: TAPs 2020-02-19T18:11:05.649Z
Training Regime Day 4: Murphyjitsu 2020-02-18T17:33:12.523Z
Training Regime Day 3: Tips and Tricks 2020-02-17T18:53:24.808Z
Training Regime Day 2: Searching for bugs 2020-02-16T17:16:32.606Z
Training Regime Day 1: What is applied rationality? 2020-02-15T21:03:32.685Z
Training Regime Day 0: Introduction 2020-02-14T08:22:19.851Z


Comment by mark-xu on A space of proposals for building safe advanced AI · 2020-11-21T17:17:51.676Z · LW · GW

I claim that if we call the combination of the judge plus one debater Amp(M), then we can think of the debate as M* being trained to beat Amp(M) by Amp(M)'s own standards.

This seems like a reasonable way to think of debate.

I think, in practice (if this even means anything), the power of debate is quite bounded by the power of the human, so some other technique is needed to make the human capable of supervising complex debates, e.g. imitative amplification.

Comment by mark-xu on A space of proposals for building safe advanced AI · 2020-11-20T16:15:11.654Z · LW · GW

Debate: train M* to win debates against Amp(M).

I think Debate is closer to "train M* to win debates against itself as judged by Amp(M)".

Comment by mark-xu on Mark Xu's Shortform · 2020-11-08T22:34:24.948Z · LW · GW

If you have DAI right now, minting on and swapping yTrump for nTrump on is an almost guaranteed 15% profit.

Comment by mark-xu on Does SGD Produce Deceptive Alignment? · 2020-11-07T16:40:33.647Z · LW · GW

Yep. Meant to say "if a model knew that it was in its last training episode and it wasn't going to be deployed." Should be fixed.

Comment by mark-xu on Hammers and Nails · 2020-11-03T01:41:27.178Z · LW · GW

I think murphyjitsu is my favorite technique.

  1. sometimes failing lets you approach a problem from a different angle
  2. humor often results from failure, so anticipating how you'll fail and nudging to make it more probable might create more humor
  3. murphyjitsu is normally used in making plans, but you can murphyjitsu your opponent's plans to identify the easiest ways to break them
  4. micro-murphyjitsu is the art of constantly simulating reality like 5 seconds before, sort like like overclocking your OODA loop
  5. murphyjitsu is a fast way to tell if your plan is good or not - you don't always have to make it better
  6. you can get intuitive probabilities for various things by checking how surprised you are at those things
  7. if you imagine your plan succeeding instead of failing, then it might cause you realize some low-probability high-impact actions to take
  8. you can murphyjitsu plans that you might make to get a sense of the tractability of various goals
  9. murphyjitsu might help correct for overconfidence if you imagine ways you could be wrong every time you make a prediction
  10. Can murphyjitsu things that aren't plans. E.g. you can suppose the existence of arguments that would change your mind.
Comment by mark-xu on MikkW's Shortform · 2020-10-29T22:26:17.301Z · LW · GW is a result showing tht Aumann's Agreement is computationally efficient under some assumptions, which might be of interest.

Comment by mark-xu on What are good election betting opportunities? · 2020-10-29T19:04:33.240Z · LW · GW is a doc I wrote explaining how to do this in a way that is slightly less risky than betting on catnip directly.

Comment by mark-xu on Introduction to Cartesian Frames · 2020-10-25T17:41:40.340Z · LW · GW

Good point - I think the correct definition is something like "rows (or sets of rows) for which there exists a row which is disjoint"

Comment by mark-xu on Mark Xu's Shortform · 2020-10-22T21:53:37.386Z · LW · GW

This made me chuckle. More humor

  • Rationalists taxonomizing rationalists
  • Mesa-rationalists (the mesa-optimizers inside rationalists)
  • carrier pigeon rationalists
  • proto-rationalists
  • not-yet-born rationalists
  • literal rats
  • frequentists
  • group-house rationalists
  • EA forum rationalists
  • academic rationalists
  • meme rationalists


Comment by mark-xu on Introduction to Cartesian Frames · 2020-10-22T16:43:31.163Z · LW · GW

This is very exciting. Looking forward to the rest of the sequence.

As I was reading, I found myself reframing a lot of things in terms of the rows and columns of the matrix. Here's my loose attempt to rederive most of the properties under this view.

  • The world is a set of states. One way to think about these states is by putting them in a matrix, which we call "cartesian frame." In this frame, the rows of the matrix are possible "agents" and the columns are possible "environments".
    • Note that you don't have to put all the states in the matrix.
  • Ensurables are the part of the world that the agent can always ensure we end up in. Ensurables are the rows of the matrix, closed under supersets
  • Preventables are the part of the world that the agent can always ensure we don't end up in. Preventables are the complements of the rows, closed under subsets
  • Controllables are parts of the world that are both ensurable and preventable. Controlables are rows (or sets of rows) for which there exists rows that are disjoint. [edit: previous definition of "contains elements not found in other rows" was wrong, see comment by crabman]
  • Observeables are parts of the environment that the agent can observe and act conditionally according to. Observables are columns such that for every pair of rows there is a third row that equals the 1st row if the environment is in that column and the 2nd row otherwise. This means that for every two rows, there's a third row that's made by taking the first row and swapping elements with the 2nd row where it intersects with the column.
    • Observables have to be sets of columns because if they weren't, you can find a column that is partially observable and partially not. This means you can build an action that says something like "if I am observable, then I am not observable. If I am not observable, I am observable" because the swapping doesn't work properly.
    • Observables are closed under boolean combination (note it's sufficient to show closure under complement and unions):
      • Since swapping index 1 of a row is the same as swapping all non-1 indexes, observables are closed under complements.
      • Since you can swap indexes 1 and 2 by first swapping index 1, then swapping index 2, observables are closed under union.
        • This is equivalent to saying "If A or B, then a0, else a2" is logically equivalent to "if A, then a0, else (if B, then a0, else a2)"
  • Since controllables are rows with specific properties and observables are columns with specific properties, then nothing can be both controllable and observable. (The only possibility is the entire matrix, which is trivially not controllable because it's not preventable)
    • This assumes that the matrix has at least one column
  • The image of a cartesian frame is the actual matrix part.
  • Since an ensurable is a row (or superset) and an observable is a column (or set of columns), then if something is ensurable and observable, then it must contain every column, so it must be the whole matrix (image).
  • If the matrix has 1 or 0 rows, then the observable constraint is trivially satisfied, so the observables are all possible sets of (possible) environment states (since 0/1 length columns are the same as states).
    • "0 rows" doesn't quite make sense, but just pretend that you can have a 0 row matrix which is just a set of world states.
  • If the matrix has 0 columns, then the ensurable/preventable contraint is trivially satisfied, so the ensurables are the same as the preventables are the same as the controllables, which are all possible sets of (possible) environment states (since "length 0" rows are the same as states).
    • "0 columns doesn't make that much sense either but pretend that you can have a 0 column matrix which is just a set of world state.
  • If the matrix has exactly 1 column, then the ensurable/preventable constraint is trivially satisfied for states in the image (matrix), so the ensurables are all non-empty sets of states in the matrix (since length 1 columns are the same as states), closed under union with states outside the matrix. It should be easy to see that controllables are all possible sets of states that intersect the matrix non-trivially, closed under union with states outside the matrix.
Comment by mark-xu on Introduction to Cartesian Frames · 2020-10-22T14:54:56.597Z · LW · GW

In 4.1:

Given a0 and a1, since S∈Obs(C), there exists an a2∈A such that for all e∈E, we have a2∈if(S,a0,a1). Then, since T∈Obs(C), there exists an a3∈A such that for all e∈E, we have a3∈if(S,a0,a2). Unpacking and combining these, we get for all e∈E, a3∈if(S∪T,a0,a1). Since we could construct such an a3 from an arbitrary a0,a1∈A, we know that S∪T∈Obs(C). □

I think there's a typo here. Should be , not .

(also not sure how to copy latex properly).

Comment by mark-xu on Babble challenge: 50 ways of solving a problem in your life · 2020-10-22T13:44:29.281Z · LW · GW

problem: I don't do enough focused work in a day.

  1. set aside set times for focused work via calendar
  2. put "do focused work" on my todo list (actually already did this and worked surprisingly well for a week - why doesn't it work as well anymore?)
  3. block various chatting apps
  4. block lesswrong?
  5. do pomodoros
  6. use some coworking space to encourage focus
  7. take more breaks
  8. eat healthier food (possibly no carbs) to have more energy
  9. get a better sleep schedule to have more energy
  10. meditate more for better meta-cognition and focus
  11. try to do deliberate practice on doing focused work
  12. install a number of TAPs related to suppressing desires for distraction, e.g. "impulse to stop working -> check pomodoro timer"
  13. I'm told complice is useful
  14. daily reviews might be helpful?
  15. be more specific when doing weekly review
  16. make more commitments to other people about the amount of output I'm going to have, creating social pressure to actually produce that amount of output
  17. be more careful when scheduling calls with people so i have long series of uninterrupted hours
  18. take more naps when I notice I'm losing focus
  19. be more realistic about the amount of focused work I can do in a day (does "realize this isn't actually a problem" count as solving it? seems like yes)
  20. vary the length of pomodoros
  21. do resolve cycles for solutions to the problem, implementing some of them
  22. read various productivity books, like the procrastination equation, GTD, tiny habits, etc.
  23. exercise more for more energy (unfortunately, the mind is currently still embodied)
  24. make sure I'm focusing on the right things - better to spend half the time focusing on the most important thing than double the time on the 2nd most important thing
  25. spend more time working with people
  26. stop filling non-work time with activities that cause mental fatigue, like reading, podcasts, etc.
  27. stop doing miscellaneous things from my todolist during "breaks", e.g. don't do laundry between pomodoros, just lie on the floor and rest
  28. get into a better rhythm of work/break cycles, e.g. treat every hour as a contiguous block by default, scheduling calls on hour demarcations only
  29. use laptop instead of large monitor - large screens might make it easier to get distracted
  30. block the internet on my computer during certain periods of time so I can focus on writing
  31. take various drugs that give me more energy, e.g. caffeine, nicotine, and other substances
  32. stop drinking things like tea - the caffeine might give more energy, but make focusing harder
  33. wear noise-canceling headphones to block out distractions from noise
  34. listen to music designed to encourage focus, like cool rhythms or video game music
  35. work on things that are exciting - focus isn't a problem if they're intrinsically enjoyable
  36. Ben Kuhn has some good tips - check those out again
  37. RescueTime says most of my distracting time is on messenger and signal. I think quarantine is messing with my desire for social interaction. Figure out how to replace that somehow?
  38. communicate via email/google doc instead of instant messaging
  39. make sure to have snacks to keep up blood sugar
  40. alternate between standing desk and sitting desk to add novelty
  41. reduce cost for starting to do focused work by having a clear list of focused work that needs to be done, leaving computer in state ready to start immediately upon coming back to it
  42. nudge myself into doing focused work by doing tasks that require micro-focus first, like make metaculus predictions, then move on to more important focused work
  43. ask a LW question about how to do more focused work and read the answers
  44. work on more physical substrates, e.g. paper+pen, whiteboard
  45. use a non-linux operating system to get access to better tools for focusing, like cold turkey, freedom, etc.
  46. switch mouse to left hand which will cause more effort to be needed to mindlessly use computer, potentially decreasing mindlessness
  47. acquire more desktoys to serve as non-computer distractions that might preserve focus better
  48. practice focusing on non-work thing, e.g. by studying a random subject, playing a game I don't like, being more mindful in everyday life, etc.
  49. do more yoga to feel more present in body
  50. TAP common idle activity I do with "focus on work", e.g. crack knuckles, stretch arms, adjust seat.

Time taken: 20 minutes

More things I thought of after reading Rafael Harth's response:

  1. use something like beeminder to do more focused work
  2. do research directly into what causes some people to be better at focusing than others
  3. ask people that seem to be good at doing focused work for tips
  4. reread Deep Work and take it more seriously
Comment by mark-xu on The Solomonoff Prior is Malign · 2020-10-22T13:19:31.651Z · LW · GW

I personally see no fundamental difference between direct and indirect ways of influence, except in so far as they relate to stuff like expected value.

I agree that given the amount expected influence, other universes are not high on my priority list, but they are still on my priority list. I expect the same for consequentialists in other universes. I also expect consequentialist beings that control most of their universe to get around to most of the things on their priority list, hence I expect them to influence the Solmonoff prior.

Comment by mark-xu on The Solomonoff Prior is Malign · 2020-10-22T13:14:49.316Z · LW · GW

Consequentialists can reason about situations in which other beings make important decisions using the Solomonoff prior. If the multiple beings are simulated them, they can decide randomly (because having e.g. 1/100 of the resources is better than none, which is the expectation of "blind mischievousness").

An example of this sort of reasoning is Newcomb's problem with the knowledge that Omega is simulating you. You get to "control" the result of your simulation by controlling how you act, so you can influence whether or not Omega expects you to one-box or two-box, controlling whether there is $1,000,000 in one of the boxes.

Comment by mark-xu on Mark Xu's Shortform · 2020-10-21T18:45:07.688Z · LW · GW

My current taxonomy of rationalists is:

  • LW rationalists (HI!)
  • Facebook rationalists
  • Twitter rationalists
  • Blog rationalists
  • Internet-invisible rationalists

Are there other types of rationalists? Maybe like group-chat rationalists? or podcast rationalists? google doc rationalists?

Comment by mark-xu on What posts do you want written? · 2020-10-21T17:24:07.047Z · LW · GW

An intuitive explanation of the kelly criterion, with a bunch of worked examples. Zvi's post is good but lacks worked examples and justification for heuristics. Jacobian advises us to Kelly bet on everything, but I don't understand what a "kelly bet" is in all but the simplest financial scenarios.

Comment by mark-xu on What posts do you want written? · 2020-10-19T19:37:40.657Z · LW · GW

Maybe you mean that "body memory" is an intuitive subconscious process in the brain?

Yes, but I like thinking of it as "body memory" because it is easier to conceptualize.

Comment by mark-xu on What posts do you want written? · 2020-10-19T03:12:54.637Z · LW · GW

I want more people to write down their models for various things. For example, a model I have of the economy is that it's a bunch of boxes with inputs and outputs that form a sparsely directed graph. The length of the shortest cycle controls things like economic growth and AI takeoff speeds.

Another example is that people have working memory in both their brains and their bodies. When their brain-working-memory is full, information gets stored in their bodies. Techniques like focusing are often useful to extact information stored in body-working-memory.

Comment by mark-xu on What posts do you want written? · 2020-10-19T03:07:18.720Z · LW · GW

A minimal-assumption description of Updateless Decision Theory. This wiki page describes the basic concept, but doesn't include motivation, examples or intuition.

Comment by mark-xu on What posts do you want written? · 2020-10-19T03:04:23.408Z · LW · GW

A thorough description of how to do pair debugging, a CFAR exercise partially described here.

Comment by mark-xu on What posts do you want written? · 2020-10-19T03:03:08.955Z · LW · GW

A review of Thinking Fast and Slow that focuses on whether or not various parts of the book replicated.

Comment by mark-xu on What posts do you want written? · 2020-10-19T03:02:11.030Z · LW · GW

A solid, minimal-assumption description of value handshakes. This SSC post contains the best description of which I'm aware, which I think is slightly sad:

Values handshakes are a proposed form of trade between superintelligences. Suppose that humans make an AI which wants to convert the universe into paperclips. And suppose that aliens in the Andromeda Galaxy make an AI which wants to convert the universe into thumbtacks.

When they meet in the middle, they might be tempted to fight for the fate of the galaxy. But this has many disadvantages. First, there’s the usual risk of losing and being wiped out completely. Second, there’s the usual deadweight loss of war, devoting resources to military buildup instead of paperclip production or whatever. Third, there’s the risk of a Pyrrhic victory that leaves you weakened and easy prey for some third party. Fourth, nobody knows what kind of scorched-earth strategy a losing superintelligence might be able to use to thwart its conqueror, but it could potentially be really bad – eg initiating vacuum collapse and destroying the universe. Also, since both parties would have superintelligent prediction abilities, they might both know who would win the war and how before actually fighting. This would make the fighting redundant and kind of stupid.

Although they would have the usual peace treaty options, like giving half the universe to each of them, superintelligences that trusted each other would have an additional, more attractive option. They could merge into a superintelligence that shared the values of both parent intelligences in proportion to their strength (or chance of military victory, or whatever). So if there’s a 60% chance our AI would win, and a 40% chance their AI would win, and both AIs know and agree on these odds, they might both rewrite their own programming with that of a previously-agreed-upon child superintelligence trying to convert the universe to paperclips and thumbtacks in a 60-40 mix.

This has a lot of advantages over the half-the-universe-each treaty proposal. For one thing, if some resources were better for making paperclips, and others for making thumbtacks, both AIs could use all their resources maximally efficiently without having to trade. And if they were ever threatened by a third party, they would be able to present a completely unified front.

Comment by mark-xu on How long does it takes to read the sequences? · 2020-10-17T17:49:51.379Z · LW · GW

pay what you want of new edition of 1st book:

entire sequences:

Comment by mark-xu on Diagonalization Fixed Point Exercises · 2020-10-17T16:56:42.451Z · LW · GW

Self-referential definitions can be constructed with the diagonal lemma. Given that the point of the exercise is to show something similar, you're right that this solution is probably a bit suspect.

Comment by mark-xu on Mark Xu's Shortform · 2020-10-17T06:39:14.482Z · LW · GW

Lesswrong posts that I want someone to write:

  1. Description of pair debugging
  2. Description of value handshakes

Maybe I'll think of more later.

Comment by mark-xu on The Solomonoff Prior is Malign · 2020-10-17T06:16:23.278Z · LW · GW

The S. prior is a general-purpose prior which we can apply to any problem. The output string has no meaning except in a particular application and representation, so it seems senseless to try to influence the prior for a string when you don't know how that string will be interpreted.

The claim is that consequentalists in simulated universes will model decisions based on the Solomonoff prior, so they will know how that string will be interpreted.

Can you give an instance of an application of the S. prior in which, if everything you wrote were correct, it would matter?

Any decision that controls substantial resource allocation will do. For example, if we're evaluting the impact of running various programs, blow up planets, interfere will alien life, etc.

Also in the category of "it's a feature, not a bug" is that, if you want your values to be right, and there's a way of learning the values of agents in many possible universes, you ought to try to figure out what their values are, and update towards them. This argument implies that you can get that for free by using Solomonoff priors.

If you are a moral realist, this does seem like a possible feature of the Solomonoff prior.

Third, what do you mean by "the output" of a program that simulates a universe?

A TM that simulates a universe must also specify an output channel.

Take your example of Life--is the output a raster scan of the 2D bit array left when the universe goes static? In that case, agents have little control over the terminal state of their universe (and also, in the case of Life, the string will be either almost entirely zeroes, or almost entirely 1s, and those both already have huge Solomonoff priors). Or is it the concatenation of all of the states it goes through, from start to finish?

All of the above. We are running all possible TMs, so all computable universes will be paired will all computable output channels. It's just a question of complexity.

Are you imagining that bits are never output unless the accidentally-simulated aliens choose to output a bit? I can't imagine any way that could happen, at least not if the universe is specified with a short instruction string.


This brings us to the 4th problem: It makes little sense to me to worry about averaging in outputs from even mere planetary simulations if your computer is just the size of a planet, because it won't even have enough memory to read in a single output string from most such simulations.

I agree that approximation the Solmonoff prior is difficult and thus its malignancy probably doesn't matter in practice. I do think similar arguments apply to cases that do matter.

5th, you can weigh each program's output proportional to 2^-T, where T is the number of steps it takes the TM to terminate. You've got to do something like that anyway, because you can't run TMs to completion one after another; you've got to do something like take a large random sample of TMs and iteratively run each one step. Problem solved.

See the section on the Speed prior.

Perhaps the biggest problem is that you're talking about an entire universe of intelligent agents conspiring to change the "output string" of the TM that they're running in. This requires them to realize that they're running in a simulation, and that the output string they're trying to influence won't even be looked at until they're all dead and gone. That doesn't seem to give them much motivation to devote their entire civilization to twiddling bits in their universe's final output in order to shift our priors infinitesimally. And if it did, the more likely outcome would be an intergalactic war over what string to output.

They don't have to realize they're in a simulation, they just have to realize their universe is computable. Consequentialists care about their values after they're dead. The cost of influncing the prior might not be that high because they only have to compute it once and the benefit might be enormous. Exponential decay + acausal trade make an intergalactic war unlikely.

Comment by mark-xu on Babble challenge: 50 ways of hiding Einstein's pen for fifty years · 2020-10-15T20:07:33.652Z · LW · GW

Might be good to have people add buffer text to the beginning of their answers. Sidebar previews tend to give away the first 1/2 answers.

Comment by mark-xu on Babble challenge: 50 ways of hiding Einstein's pen for fifty years · 2020-10-15T20:04:40.738Z · LW · GW

Thanks for doing these. They're really fun.

  1. put it on a spacecraft that will be pretty much inaccessible for 50 years
  2. put it in a big box in a volcano
  3. bury it in a random place on the ground
  4. drop in into a deep ocean with a gps locator that will only activate after 50 years
  5. scan the pen store digital copies in many places. It no longer matters if evil forces get access to the pen.
  6. sell Einstein a different pen
  7. become a pen collector. The evil forces will not know which pen to steal.
  8. put it on the moon
  9. go inside a locked room and somehow have energy to not have to leave for 50 years
  10. put it in your pocket
  11. purchase safety deposit boxes in every major bank
  12. break the pen into a bunch of pieces and scatter then across the globe
  13. melt part of the ice in Antarctica and freeze the pen there
  14. pay money to the forces of good to prevent the forces of evil from acquiring the pen
  15. store the pen in your bag of holding
  16. you probably don't have to hide it for evil forces to not be able to get it. It's pretty hard to find a random pen without other identifying markers and they can't afford to obtain every pen.
  17. fort knox
  18. put it in a lake of poison on an island surrounded by inferni
  19. stare at the pen for a long time to burn it into your memory. Destroy it and hope that neuroscience will advance far enough for the pen to be reconstructed from memory (and that your memory is reliable enough).
  20. spin it fast enough that no one can grab onto if for fifty years.
  21. sell it to Einstein today
  22. travel 50 years into the future
  23. eat it
  24. have surgery and put the pen somewhere in your body
  25. put the pen somewhere in someone else's body. Bonus points if they're part of the evil forces.
  26. put it in your ear
  27. lose it under the couch. Nothing ever gets found from under the couch in less than 50 years.
  28. Put it in the wet concrete of a random building that is currently being constructed.
  29. mail it to a confusing place and hope the UPS loses it for 50 years
  30. fly somewhere and hope the TSA loses it for 50 years
  31. put it in a shoebox under your bed
  32. give it to the evil forces immediately. Once they have it, they'll get bored of it. Acquire it again in 50 years.
  33. put it under your pillow along with children's teeth. The tooth fairy will take it. Buy it back with more teeth in 50 years.
  34. melt it in acetone. Sell the acetone to Einstein and tell him it's a pen. His confusion will cause him to write even more miraculous papers.
  35. put it in a dumpster with a gps locator that will activate itself in 50 years.
  36. cut open a young tree and put the pen inside, letting the tree grow around it.
  37. put it in a glass bottle and throw it into the ocean. By literary convention, someone will find it in 50 years.
  38. put the pen in a massive set of recursive envelopes, each one instructing the recipient to mail the set to the next person. Time it so that you get the pen back in 50 years.
  39. attend events that you think are very impactful. Wait for time travelers. Politely ask the time travelers to send the pen 50 years into the future.
  40. politely ask the evil forces to stop being evil.
  41. melt the pen and craft it into a different object. Reconstruct the pen in 50 years.
  42. use that spell from harry potter that prevents anyone from knowing the location of a building.
  43. tell everyone you know it's a treasured heirloom. Die. You will be buried with the pen. Have one of the ancestors dig up the pen and sell it to Einstein.
  44. spend a lot of your time trying to hide a pencil instead. The evil forces will assume that they misremembered and are actually trying to obtain the pencil. Cackle.
  45. Replace each part of the pen with an identical part creating Theseus's Pen. The resulting philosophical confusion will drive the evil forces mad.
  46. Sell the evil forces the pen for a high price. Invest the money. 50 years later, you will be rich and easily able to buy the pen back.
  47. Join the evil forces. Now they have the pen, so they are content. Obtain a high rank in 50 years and sell the pen to Einstein.
  48. write Einstein's miracle papers for him using this pen. No need to sell it to Einstein anymore.
  49. Give the pen to some other babble challenge participant and ask them to hide it for you.
  50. become a pen influencer. Convince the world that this style of pen is the best style. There will be so many identical pens that the evil forces will not be able to find the specific pen they seek.
Comment by mark-xu on The Solomonoff Prior is Malign · 2020-10-15T18:28:37.139Z · LW · GW

Looks better - thanks!

Comment by mark-xu on The Solomonoff Prior is Malign · 2020-10-15T01:04:18.995Z · LW · GW

perhaps would have been better worded as "the simplest way to specify the initial conditions of Earth is to specify the initial conditions of the universe, the laws of physics, and the location of Earth."

Comment by mark-xu on The Solomonoff Prior is Malign · 2020-10-14T18:47:57.835Z · LW · GW

I am not so convinced that penalizing more stuff will make these arguments weak enough that we don't have to worry about them. For an example of why I think this, see Are minimal circuits deceptive?. Also, adding execution/memory constraints penalizes all hypothesis and I don't think universes with consequentialists are asymmetrically penalized.

I agree about this being a special case of mesa-optimization.

Comment by mark-xu on The Solomonoff Prior is Malign · 2020-10-14T16:22:17.032Z · LW · GW

In your section "complexity of conditioning", if I am understanding correctly, you compare the amount of information required to produce consequentialists with the amount of information in the observations we are conditioning on. This, however, is not apples to oranges: the consequentialists are competing against the "true" explanation of the data, the one that specifies the universe and where to find the data within it, they are not competing against the raw data itself. In an ordered universe, the "true" explanation would be shorter than the raw observation data, that's the whole point of using Solomonoff induction after all.

The data we're conditioning on has K-complexity of one megabyte. Maybe I didn't make this clear.

So, there are two advantages the consequentialists can exploit to "win" and be the shorter explanation. This exploitation must be enough to overcome those 10-1000 bits. One is that, since the decision which is being made is very important, they can find the data within the universe without adding any further complexity. This, to me, seems quite malign, as the "true" explanation is being penalized simply because we cannot read data directly from the program which produces the universe, not because this universe is complicated.

I don't think I agree with this. Thinking in terms of consequentialists competing against "true" explanations doesn't make that much sense to me. It seems similar to making the exec hello world "compete" against the "true" print hello world.

The "complexity of consequentialists" section answers the question of "how long is the exec function?" where the "interpreter" exec calls is a universe filled with consequentialists.

However, if you could argue that the consequentialists actually had an advantage here which outweighed their own complexity, this would just sound to me like an argument that we are living in a simulation, because it would essentially be saying that our universe is unduly tuned to be valuable for consequentialists, to such a degree that the existence of these consequentialists is less of a coincidence than it just happening to be that valuable.

I do not understand what this is saying. I claim that consequentialists can reason about our universe by thinking about TMs because our universe is computable. Given that our universe supports life, it might thus be valuable to some consequentialists in other universes. I don't think the argument takes a stance on whether this universe is a simulation; it merely claims that this universe could be simulated.

Comment by mark-xu on The Solomonoff Prior is Malign · 2020-10-14T16:07:55.345Z · LW · GW

I agree. The sentence quoted is a separate observation.

Comment by mark-xu on The Solomonoff Prior is Malign · 2020-10-14T16:05:43.821Z · LW · GW

Nope. Should be fixed now.

Comment by mark-xu on Babble challenge: 50 ways to escape a locked room · 2020-10-09T22:53:58.620Z · LW · GW

A potentially different style of babble challenge is "coming up with N opinions on <topic>". Examples are Jeffrey Ladish with 100 opinions about nuclear war and me with 100 opinions about emotions. 100 seems a bit large, but 50 seems manageable in one hour.

Comment by mark-xu on Babble challenge: 50 ways to escape a locked room · 2020-10-09T22:51:40.088Z · LW · GW

Numbers 23 and 37 made me chuckle. I also liked the resigned tone of 42. 

Comment by mark-xu on Babble challenge: 50 ways to escape a locked room · 2020-10-09T19:45:13.935Z · LW · GW

1. Break the 4th wall and step out of the hypothetical scenario

2. Call someone with my phone and ask them to get me out

3. Program a copy of my brain into my phone and upload myself

4. Broadcast enough information about myself to allow for reconstruction outside the box

5. Use my phone to convince enough people to think in similar ways to me that my consciousness is, on average, outside the box

6. Kick the door for 10 years. It'll probably open.

7. A phone with 10 years a battery can likely be turned into a potent bomb

8. Meditate until you have escaped the chains of earthly desire.

9. Wake up.

10. Break the phone, turn it into a screwdriver, unscrew the hinges of the door.

11. Fashion your phone into a tool that can be used to pop the pins out of the hinges.

12. Fashion fingernails into lock pick and pick the lock.

13. Ask very politely for the person outside to open the door.

14. Escape life by dying.

15. Tunnel out through the ground using your phone as a shovel.

16. Luckily, the door can be unlocked from the inside. Open the door and walk out.

17. Develop telekinesis (somehow) and use that to open the door.

18. The map is the territory. Use your phone to draw a picture of the cell with the door open.

19. The sheer improbability of having enough energy to not need food or water for 10 years makes this location a prime spot for future people to travel back in time to. Sit in the cell and await said people to appear and get you out.

20. Since you have so much energy, you can conclude that you're likely in a simulation. One reason you might be in a simulation is to test how people can escape cells. If you're a very boring simulation, they might stop simulating you. Congratulations, you've escaped from future simulations.

21. You might also be in a simulation for decision-making purposes. Think very hard about what decisions might be dependent on your actions and manipulate the simulation to cause yourself to escape.

22. Your body has a large amount of energy in it (by magic?). Heat is energy, so transmute this energy into heat and burn your way out of the door.

23. Clearly some sort of weird physics violating things are happening. Do some more physics violating things and just step through the wall. (this is allowed I promise).

24. Unscrew the doorknob from the door and open it.

25.Luckily the people who made the room forgot to install a ceiling (how foolish of them). Simply climb your way out.

26. A battery that can contain 10 years of charge must be powered by Pym particles. Use those to shrink yourself and climb under the door.

27. Declare that the inside of the room is actually the outside and it's the rest of the world that's "inside". Your semantic trickery has let you escape.

28. Discover jacobjacob's lesswrong password and change the Oct 7th babble challenge to say "you find yourself in an unlocked room". Since the room is unlocked, escape is trivial.

29. Use the anthropic principle to quantum tunnel out of the room (I promise this is a meaningful sentence).

30. Use your phone to divide by zero, creating a paradox that consumes the walls of the room.

31. Download the game "break out" on your phone and use it to break out of the room.

32. Break the window and climb out of the room (the room has a window, I promise).

33. You have enough energy to not need food/water for 10 years. You're clearly a superhero. Just punch the wall and it'll break.

34. For a phone with so much battery power, the flashlight is actually an extremely powerful laser. Use this laser to cut your way out of the room.

35. Simply use the wifi you've been given to convince someone to let you out of the box.

36. Good thing you've developed the habit of wearing clothing that contains thermite explosives. Use those to bust your way out of the room.

37. Making the walls of a room with paper kind of makes the lock redundant. Shrug, while you rip your way out of the room.

38. If you think hard enough, you can do the impossible. If getting out of the room is impossible, think hard enough to do it. If it's not, well then you can just do it.

39. Use one of the 1300 ways to go to the moon. You are no longer in the room.

40. Close your eyes and wander around the room long enough to get lost. Since the room is too small to get lost in, by logical necessity, you will now be outside of the room.

41. Enter a deep meditative trance that will extend the amount of energy you have indefinitely. Await the gradual corrosion of the cell around you.

42. You have wifi and you don't need to eat, is this really a place you want to escape? Freedom is all in the mind anyway...

43. Master lucid dreaming and go to sleep for a very long time.

44. Good thing your phone also doubles as a teleportation device. Use that to teleport your way out.

45. Wait till your parents get home to let you out. In the meantime, try to come up with a story to avoid embarrassment.

46. When the singularity happens, all beings will be rescued from suffering. Simply wait until then.

47. Credibly commit to not even trying to escape unless you get let out. Since the creator of the hypothetical scenario wants you to come up with ideas on how to escape, this commitment means there's no longer a point to leaving you in the room, so they will let you out.

48. Come up with so many ways to escape that you learn how to "think outside the box." Since the room is a box and you're currently thinking, you must now be outside the box.

49. Signal you have escaped so hard that reality gets confused and lets you escape.

50. Go onto the SCP wiki and modify your own containment procedures to include unlocking the door. Walk out after the door gets unlocked.

Comment by mark-xu on “Unsupervised” translation as an (intent) alignment problem · 2020-10-01T03:41:53.551Z · LW · GW

This seems not false but it also seems like you're emphasizing the wrong bits, e.g. I don't think we quite need the model to be transparent/"see how it's making decisions" to know how it will generalize. 

At some point, model M will have knowledge that should enable it to do X task. However, it's currently unclear how one would get M to do X in a way that doesn't implicitly trust the model to be doing something it might not be doing. It's a core problem of prosaic alignment to figure out how to get M to do X in a way that allows us to know that M is actually doing X-and-only-X instead of something else. 

Comment by mark-xu on Mati_Roy's Shortform · 2020-09-21T05:51:08.073Z · LW · GW

reminded me of Uriel explaining Kabbalah:


Comment by mark-xu on Understanding “Deep Double Descent” · 2020-09-21T05:34:37.003Z · LW · GW gives a theoretical argument that suggests SGD will converge to a point that is very close in L2 norm to the initialization. Since NNs are often initialized with extremely small weights, this amounts to implicit L2 regularization. 

Comment by mark-xu on Capturing Ideas · 2020-09-10T16:49:51.169Z · LW · GW

Not OP, but I have a similar hotkey. I use Todoist as my capture system and mapped Alt+Super+o the following script (is there a way to embed code in comments?):

wmctrl -x -a "Todoist"

xdotool keyup Alt Super o

xdotool type --clearmodifiers --delay=3 q

Script performs: select Todoist window, lift hotkeys, wait tiny amount of time, press q (q is the hotkey to add a task for Todoist).

Comment by mark-xu on Capturing Ideas · 2020-09-10T16:45:31.573Z · LW · GW

I've found one of the main benefits of getting a virtual assistant type device (Alexa, Google Home) is allowing me to capture ideas by verbalizing them. This is especially useful if I'm falling asleep and don't want to pull out a notebook/phone. 

This looks like me saying things like "Alexa, add 'is it meaningful to say that winning the lottery is difficult' to my todo list".

Comment by mark-xu on Training Regime Day 0: Introduction · 2020-09-08T17:08:29.282Z · LW · GW

I went to a CFAR workshop more recently, so there might be some content that is slightly newer. Additionally, my sequence is not yet completed and I am worse at writing.

The most important thing about reading any such sequence is to actually practice the techniques. I suggest reading the sequence that is most likely to get you to do that. If you think both are equally likely, I would recommend the Hammertime sequence.

Comment by mark-xu on Do you vote based on what you think total karma should be? · 2020-08-24T16:19:38.806Z · LW · GW

copying my comment from

Note that this is in reference to voting on question answers.

> Downvoting in general confuses me, but I think that downvoting to 0 is appropriate if the answer isn't quite answering the question, but downvoting past zero doesn't make sense. Downvoting to 0 feels like saying "this isn't that helpful" whereas downvoting past 0 feels like "this is actively harmful".

Comment by mark-xu on Forecasting Thread: AI Timelines · 2020-08-22T18:48:00.807Z · LW · GW

My rough take:

3 buckets, similar to Ben Pace's 

  1. 5% chance that current techniques just get us all the way there, e.g. something like GPT-6 is basically AGI
  2. 10% chance AGI doesn't happen this century, e.g. humanity sort of starts taking this seriously and decides we ought to hold off + the problem being technically difficult enough that small groups can't really make AGI themselves
  3. 50% chance that something like current techniques and some number of new insights gets us to AGI. 

If I thought about this for 5 additional hours, I can imagine assigning the following ranges to the scenarios:

  1. [1, 25]
  2. [1, 30]
  3. [20, 80]
Comment by mark-xu on Covid 8/13: Same As It Ever Was · 2020-08-18T01:14:34.996Z · LW · GW

Texas confusion might be explained partially by a coding error.

Comment by mark-xu on What are some low-information priors that you find practically useful for thinking about the world? · 2020-08-07T18:22:07.486Z · LW · GW

Gwern's essay about how everything is correlated seems related/relevant:

Comment by mark-xu on What are some low-information priors that you find practically useful for thinking about the world? · 2020-08-07T18:20:28.387Z · LW · GW

This is personal to me, but I once took a class at school where all the problems were multiple choice, required a moderate amount of thought, and were relatively easy. I got 1/50 wrong, giving me a 2% base rate for making the class of dumb mistakes like misreading inequalities or circling the wrong answer.

This isn't quite a meta-prior, but it seemed sort of related?

Comment by mark-xu on Tools for keeping focused · 2020-08-05T06:03:00.238Z · LW · GW

One of my similar tools is trying to avoid keeping my phone in my pocket. Using my phone is a fine thing to do, but having my default state be "can use my phone within 5 seconds" is generally distracting and causes more phone use than necessary. For this reason, I own an ipod touch because I needed access to my calendar/todo-list at all times, but didn't want to keep my phone on me.

Comment by mark-xu on What are your greatest one-shot life improvements? · 2020-08-01T18:13:59.850Z · LW · GW

Using duct tape to tape my floss to my toothpaste has moved my flossing compliance from ~80% -> ~98%