AlphaStar: Impressive for RL progress, not for AGI progress 2019-11-02T01:50:27.208Z · score: 88 (40 votes)
orthonormal's Shortform 2019-10-31T05:24:47.692Z · score: 9 (1 votes)
Fuzzy Boundaries, Real Concepts 2018-05-07T03:39:33.033Z · score: 58 (16 votes)
Roleplaying As Yourself 2018-01-06T06:48:03.510Z · score: 85 (31 votes)
The Loudest Alarm Is Probably False 2018-01-02T16:38:05.748Z · score: 162 (63 votes)
Value Learning for Irrational Toy Models 2017-05-15T20:55:05.000Z · score: 0 (0 votes)
HCH as a measure of manipulation 2017-03-11T03:02:53.000Z · score: 1 (1 votes)
Censoring out-of-domain representations 2017-02-01T04:09:51.000Z · score: 2 (2 votes)
Vector-Valued Reinforcement Learning 2016-11-01T00:21:55.000Z · score: 2 (2 votes)
Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences 2016-06-18T00:55:10.000Z · score: 2 (2 votes)
Proof Length and Logical Counterfactuals Revisited 2016-02-10T18:56:38.000Z · score: 3 (3 votes)
Obstacle to modal optimality when you're being modalized 2015-08-29T20:41:59.000Z · score: 3 (3 votes)
A simple model of the Löbstacle 2015-06-11T16:23:22.000Z · score: 2 (2 votes)
Agent Simulates Predictor using Second-Level Oracles 2015-06-06T22:08:37.000Z · score: 2 (2 votes)
Agents that can predict their Newcomb predictor 2015-05-19T10:17:08.000Z · score: 1 (1 votes)
Modal Bargaining Agents 2015-04-16T22:19:03.000Z · score: 3 (3 votes)
[Clearing out my Drafts folder] Rationality and Decision Theory Curriculum Idea 2015-03-23T22:54:51.241Z · score: 6 (7 votes)
An Introduction to Löb's Theorem in MIRI Research 2015-03-23T22:22:26.908Z · score: 16 (17 votes)
Welcome, new contributors! 2015-03-23T21:53:20.000Z · score: 4 (4 votes)
A toy model of a corrigibility problem 2015-03-22T19:33:02.000Z · score: 4 (4 votes)
New forum for MIRI research: Intelligent Agent Foundations Forum 2015-03-20T00:35:07.071Z · score: 36 (37 votes)
Forum Digest: Updateless Decision Theory 2015-03-20T00:22:06.000Z · score: 5 (5 votes)
Meta- the goals of this forum 2015-03-10T20:16:47.000Z · score: 3 (3 votes)
Proposal: Modeling goal stability in machine learning 2015-03-03T01:31:36.000Z · score: 1 (1 votes)
An Introduction to Löb's Theorem in MIRI Research 2015-01-22T20:35:50.000Z · score: 2 (2 votes)
Robust Cooperation in the Prisoner's Dilemma 2013-06-07T08:30:25.557Z · score: 73 (71 votes)
Compromise: Send Meta Discussions to the Unofficial LessWrong Subreddit 2013-04-23T01:37:31.762Z · score: -2 (18 votes)
Welcome to Less Wrong! (5th thread, March 2013) 2013-04-01T16:19:17.933Z · score: 27 (28 votes)
Robin Hanson's Cryonics Hour 2013-03-29T17:20:23.897Z · score: 29 (34 votes)
Does My Vote Matter? 2012-11-05T01:23:52.009Z · score: 19 (37 votes)
Decision Theories, Part 3.75: Hang On, I Think This Works After All 2012-09-06T16:23:37.670Z · score: 23 (24 votes)
Decision Theories, Part 3.5: Halt, Melt and Catch Fire 2012-08-26T22:40:20.388Z · score: 31 (32 votes)
Posts I'd Like To Write (Includes Poll) 2012-05-26T21:25:31.019Z · score: 14 (15 votes)
Timeless physics breaks T-Rex's mind [LINK] 2012-04-23T19:16:07.064Z · score: 22 (29 votes)
Decision Theories: A Semi-Formal Analysis, Part III 2012-04-14T19:34:38.716Z · score: 23 (28 votes)
Decision Theories: A Semi-Formal Analysis, Part II 2012-04-06T18:59:35.787Z · score: 16 (19 votes)
Decision Theories: A Semi-Formal Analysis, Part I 2012-03-24T16:01:33.295Z · score: 23 (25 votes)
Suggestions for naming a class of decision theories 2012-03-17T17:22:54.160Z · score: 5 (8 votes)
Decision Theories: A Less Wrong Primer 2012-03-13T23:31:51.795Z · score: 72 (76 votes)
Baconmas: The holiday for the sciences 2012-01-05T18:51:10.606Z · score: 5 (5 votes)
Advice Request: Baconmas Website 2012-01-01T19:25:40.308Z · score: 11 (11 votes)
[LINK] "Prediction Audits" for Nate Silver, Dave Weigel 2011-12-30T21:07:50.916Z · score: 12 (13 votes)
Welcome to Less Wrong! (2012) 2011-12-26T22:57:21.157Z · score: 26 (27 votes)
Improving My Writing Style 2011-10-11T16:14:40.907Z · score: 6 (9 votes)
Decision Theory Paradox: Answer Key 2011-09-05T23:13:33.256Z · score: 6 (6 votes)
Consequentialism Need Not Be Nearsighted 2011-09-02T07:37:08.154Z · score: 55 (55 votes)
Decision Theory Paradox: PD with Three Implies Chaos? 2011-08-27T19:22:15.046Z · score: 19 (29 votes)
Why are certain trends so precisely exponential? 2011-08-06T17:38:42.140Z · score: 16 (17 votes)
Nature: Red, in Truth and Qualia 2011-05-29T23:50:28.495Z · score: 44 (38 votes)
A Study of Scarlet: The Conscious Mental Graph 2011-05-27T20:13:26.876Z · score: 29 (34 votes)


Comment by orthonormal on Autism And Intelligence: Much More Than You Wanted To Know · 2019-11-14T06:39:20.278Z · score: 13 (8 votes) · LW · GW

The other explanation I've heard bandied about is a polygenic version of sickle-cell anemia (where being heterozygous for the allele protects you from malaria but being homozygous gives you an awful disease).

In this model, there are a bunch of alleles that all push the phenotype in roughly the same direction, and having some of them is good, but past some threshold fitness starts rapidly declining.

(Further speculation: the optimum threshold is higher in the environment of civilization than in the ancestral environment, so these genes are experiencing massive positive selection over the last 10,000 years.)

This isn't a causal explanation, but it differs from the tower-foundation model in claiming that there's not a separate weakness to go looking for, just an optimum that's being surpassed- especially when two people near the optimum have children.

Comment by orthonormal on Experiments and Consent · 2019-11-11T02:49:59.496Z · score: 4 (2 votes) · LW · GW

The NTSB report was released last week, showing that Uber's engineering was doing some things very wrong (with specifics that had not been reported before). Self-driving programs shouldn't go on public roads with that kind of system, even with a driver ready to take over.

Comment by orthonormal on AlphaStar: Impressive for RL progress, not for AGI progress · 2019-11-10T07:29:32.005Z · score: 2 (1 votes) · LW · GW

Exactly. It seems like you need something beyond present imitation learning and deep reinforcement learning to efficiently learn strategies whose individual components don't benefit you, but which have a major effect if assembled perfectly together.

(I mean, don't underestimate gradient descent with huge numbers of trials - the genetic version did evolve a complicated eye in such a way that every step was a fitness improvement; but the final model has a literal blind spot that could have been avoided if it were engineered in another way.)

Comment by orthonormal on Recent updates to (2016-2017) · 2019-11-10T06:30:35.565Z · score: 2 (1 votes) · LW · GW

I just finished Story of Your Life, and I disagree with your attempt to retcon it as a physically possible story; I think the evidence is clear that Chiang intended it to be understood the way that most people understand it, and that he in fact was misunderstanding* the interaction between the variational principle and causality.

[Spoilers ahead.]

Firstly, it is a really plausible misunderstanding. If you don't deeply grok how decoherence works, then the principle of least action can really seem like it requires information to behave in nonsequential ways. (But in fact the Schrodinger equation exhibits causality, and variational principles are just the result of amplitudes nearly cancelling out for all paths whose action is not a local extremum.) Chiang is a gifted amateur in physics, so this misconception is not that unlikely.

Secondly, if your interpretation were his intended one, he could have done any number of things to suggest it! Louise directly tells the reader that this story is being told from the night of her daughter's conception, and that she now sees both future and past but does not act to change it. This is shown in the story to be the point of Heptapod B, and Chiang works at length to try and justify how understanding this language lets one understand (but not alter) the future. There's nothing else in the story to hint that Louise is an unreliable narrator.

If your interpretation were Chiang's, he would have to be intentionally misdirecting the audience to a degree you only see from authors like Nabokov, and not leaving any clue except for flawed science, which is common enough for dramatic license in science fiction that it really can't count as a clue. I doubt Chiang is doing that.

And finally, the whole thematic import of the story is ruined by your interpretation; the poignance comes from Louise's existentialist fatalism, giving life to the lines of the tragic play she is cast in, which is only meaningful insofar as she's telling the truth about perceiving forwards and backwards. (It's a variant on the Trafalmadorians, though interestingly Chiang hadn't yet read Slaughterhouse Five when he wrote this.) It's just a more boring story the way you see it!

*Possibly intentionally misunderstanding; I can imagine a science fiction author pestering their physics friend with "I've got a great story idea using this concept", the friend saying "It doesn't work that way", and the author asking "But if I pretend it does work that way, exactly how much physics knowledge would it take someone to be annoyed by it?"

Comment by orthonormal on orthonormal's Shortform · 2019-11-09T18:03:23.756Z · score: 4 (2 votes) · LW · GW

Decision-theoretic blackmail is when X gets Y to choose A over B, not via acting to make the consequences of A more appealing to Y, but by making the consequences of B less appealing to Y.

The exceptions to this definition are pretty massive, though, and I don't know a principled emendation that excludes them.

1. There's a contract / social contract / decision-theoretic equilibrium, and within that, B will be punished. (This may not be a true counterexample, because the true choice is whether to join the contract... though this is less clear for the social contract than for the other two.)

2. Precommitting not to give in to blackmail is not itself blackmail. Of course, in an ultimatum game both players can imagine themselves as doing this.

Can anyone think of more exceptions, or a redefinition that clearly excludes these?

Comment by orthonormal on The Credit Assignment Problem · 2019-11-09T17:54:34.116Z · score: 2 (1 votes) · LW · GW
Removing things entirely seems extreme.

Dropout is a thing, though.

Comment by orthonormal on The Credit Assignment Problem · 2019-11-08T22:06:50.991Z · score: 6 (3 votes) · LW · GW

Shapley Values [thanks Zack for reminding me of the name] are akin to credit assignment: you have a bunch of agents coordinating to achieve something, and then you want to assign payouts fairly based on how much each contribution mattered to the final outcome.

And the way you do this is, for each agent you look at how good the outcome would have been if everybody except that agent had coordinated, and then you credit each agent proportionally to how much the overall performance would have fallen off without them.

So what about doing the same here- send rewards to each contributor proportional to how much they improved the actual group decision (assessed by rerunning it without them and seeing how performance declines)?

Comment by orthonormal on Elon Musk is wrong: Robotaxis are stupid. We need standardized rented autonomous tugs to move customized owned unpowered wagons. · 2019-11-04T22:08:16.964Z · score: 2 (1 votes) · LW · GW

Customer service human interactions don't feel especially valuable to me, compared to intentional human interactions or even seeing other people walking down the street.

Comment by orthonormal on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-11-04T17:24:41.488Z · score: 7 (3 votes) · LW · GW

Good comment. I disagree with this bit:

I would, for instance, predict that if Superintelligence were published during the era of GOFAI, all else equal it would've made a bigger splash because AI researchers then were more receptive to abstract theorizing.

And then it would probably have been seen as outmoded and thrown away completely when AI capabilities research progressed into realms that vastly surpassed GOFAI. I don't know that there's an easy way to get capabilities researchers to think seriously about safety concerns that haven't manifested on a sufficient scale yet.

Comment by orthonormal on AlphaStar: Impressive for RL progress, not for AGI progress · 2019-11-02T16:33:58.553Z · score: 13 (8 votes) · LW · GW

To copy myself in another thread, AlphaZero did some (pruned) game tree exploration in a hardcoded way that allowed the NN to focus on the evaluation of how good a given position was; this allowed it to kind of be a "best of both worlds" between previous algorithms like Stockfish and a pure deep reinforcement learner.

Re: your middle paragraph, I agree that you're correct about an RL agent doing metalearning, though we're also agreed that with current architectures it would take a prohibitive amount of computation to get anything like a competent general causal reasoner that way.

I'm not going to go up against your intuitions on imitation learning etc; I'm just surprised if you don't expect there's a necessary architectural advance needed to make anything like general causal reasoning emerge in practice from some combination of imitation learning and RL.

Comment by orthonormal on AlphaStar: Impressive for RL progress, not for AGI progress · 2019-11-02T16:23:07.707Z · score: 5 (3 votes) · LW · GW

AlphaZero did some (pruned) game tree exploration in a hardcoded way that allowed the NN to focus on the evaluation of how good a given position was; this allowed it to kind of be a "best of both worlds" between previous algorithms like Stockfish and a pure deep reinforcement learner.

This is impossible for a game with an action space as large as StarCraft II, though; but in order to modify a game like Go, it would have to become completely different.

I'm not 100% sure about the example you raise, but it seems to me it's either going to have a decently prune-able game tree, or that humans won't be capable of playing the game at a very sophisticated level, so I'd expect AlphaZero-esque things to get superhuman at it. StarCraft is easier for humans relative to AIs because we naturally chunk concepts together (visually and strategically) that are tricky for the AI to learn.

Comment by orthonormal on AlphaStar: Impressive for RL progress, not for AGI progress · 2019-11-02T16:18:44.249Z · score: 12 (7 votes) · LW · GW
DeepMind says it hopes the techniques used to develop AlphaStar will ultimately help it "advance our research in real-world domains".
But Prof Silver said the lab "may rest at this point", rather than try to get AlphaStar to the level of the very elite players.

Comment by orthonormal on orthonormal's Shortform · 2019-11-02T01:43:21.019Z · score: 4 (2 votes) · LW · GW

Thanks, will do.

Comment by orthonormal on orthonormal's Shortform · 2019-11-01T16:32:52.993Z · score: 2 (1 votes) · LW · GW

Virtue ethics seems less easily applicable to the domain of "what governmental policies to support" than to the domain of personal behavior, so I had a hard time thinking of examples. Can you?

Comment by orthonormal on steve2152's Shortform · 2019-11-01T16:30:51.150Z · score: 4 (2 votes) · LW · GW

Right, for them "alignment" could mean their desired concept, "safe for everyone except our targets".

Comment by orthonormal on orthonormal's Shortform · 2019-11-01T16:24:35.569Z · score: 54 (11 votes) · LW · GW

DeepMind released their AlphaStar paper a few days ago, having reached Grandmaster level at the partial-information real-time strategy game StarCraft II over the summer.

This is very impressive, and yet less impressive than it sounds. I used to watch a lot of StarCraft II (I stopped interacting with Blizzard recently because of how they rolled over for China), and over the summer there were many breakdowns of AlphaStar games once players figured out how to identify the accounts.

The impressive part is getting reinforcement learning to work at all in such a vast state space- that took breakthroughs beyond what was necessary to solve Go and beat Atari games. AlphaStar had to have a rich enough set of potential concepts (in the sense that e.g. a convolutional net ends up having concepts of different textures) that it could learn a concept like "construct building P" or "attack unit Q" or "stay out of the range of unit R" rather than just "select spot S and enter key T". This is new and worth celebrating.

The overhyped part is that AlphaStar doesn't really do the "strategy" part of real-time strategy. Each race has a few solid builds that it executes at GM level, and the unit control is fantastic, but the replays don't look creative or even especially reactive to opponent strategies.

That's because there's no representation of causal thinking - "if I did X then they could do Y, so I'd better do X' instead". Instead there are many agents evolving together, and if there's an agent evolving to try Y then the agents doing X will be replaced with agents that do X'.

(This lack of causal reasoning especially shows up in building placement, where the consequences of locating any one building here or there are minor, but the consequences of your overall SimCity are major for how your units and your opponents' units would fare if they attacked you. In one comical case, AlphaStar had surrounded the units it was building with its own factories so that they couldn't get out to reach the rest of the map. Rather than lifting the buildings to let the units out, which is possible for Terran, it destroyed one building and then immediately began rebuilding it before it could move the units out!)

This means that, first, AlphaStar just doesn't have a decent response to strategies that it didn't evolve, and secondly, it doesn't do much in the way of a reactive decision tree of strategies (if I scout this, I do that). That kind of play is unfortunately very necessary for playing Zerg at a high level, so the internal meta has just collapsed into one where its Zerg agents predictably rush out early attacks that are easy to defend if expected. This has the flow-through effect that its Terran and Protoss are weaker against human Zerg than against other races, because they've never practiced against a solid Zerg that plays for the late game.

The end result cleaned up against weak players, performed well against good players, but practically never took a game against the top few players. I think that DeepMind realized they'd need another breakthrough to do what they did to Go, and decided to throw in the towel while making it look like they were claiming victory.

Finally, RL practitioners have known that genuine causal reasoning could never be achieved via known RL architectures- you'd only ever get something that could execute the same policy as an agent that had reasoned that way, via a very expensive process of evolving away from dominated strategies at each step down the tree of move and countermove. It's the biggest known unknown on the way to AGI.

Comment by orthonormal on orthonormal's Shortform · 2019-10-31T05:24:47.998Z · score: 10 (5 votes) · LW · GW

[Cross-posted from Medium, written for a pretty general audience]

There are many words that could describe my political positions. But there's one fundamental label for me: I am a consequentialist.

Consequentialism is a term from ethics; there, it means the position that consequences are what truly make an action right or wrong, rather than rules or virtues. What that means is that for me, the most essential questions about policy aren't things like "what is fair" or "what rights do people have", although these are good questions. For me, it all boils down to "how do we make people's lives better?"

(There are some bits of nuance to the previous paragraph, which I've kept as a long endnote.)

"Make people's lives better" isn't a platitude- there's a real difference here! To explain, I want to point out that there are both consequentialists and non-consequentialists within different political camps. Let's consider socialists first and then libertarians second.

Many socialists believe both that (A) the world is headed for plutocratic disaster unless capitalism is overthrown, and that (B) labor markets and massive wealth disparities would be crimes even if they did not doom others to suffering. The difference is that some are more motivated by beliefs like (A), and could thus change their positions if convinced that e.g. the Nordic model was much better for future growth than a marketless society; while others are more motivated by beliefs like (B), and would continue to support pure socialism even if they were convinced it would mean catastrophe.

And many libertarians believe both that (A') the only engine that can consistently create prosperity for all is a free market with no interference, and that (B') taxation is a monstrous act of aggression and theft. The difference is that some are more motivated by beliefs like (A'), and thus could change their position if convinced that e.g. progressive taxation and redistribution would not destroy the incentives behind economic growth; while others are more motivated by beliefs like (B'), and would continue to support pure libertarianism even if they were convinced it would mean catastrophe.

I find it fruitful to talk with the first kind of socialist and the first kind of libertarian, but not the second kind of either. The second type just isn’t fundamentally interested in thinking about the consequences (except insofar as they can convince others by arguing for certain consequences). But among the first type, it’s possible to figure out the truth together by arguing about historical cases, studying natural experiments in policy, and articulating different theories.

I hope it's been helpful to draw out this distinction; I'd encourage you to first find fellow consequentialists among your natural allies, and expand from there when and if you feel comfortable. There's a lot that can be done to make the world a better place, and those of us who care most about making the world better can achieve more once we find each other!

P.S. The above focuses on the sort of political questions where most people's influence is limited to voting and convincing others to vote with them. But there's more ways to have an effect than that; I'd like to take one last moment to recommend the effective altruism movement, which investigates the best ways for people to have a big positive impact on the world.


Nuance section:

the position that consequences are what truly make an action right or wrong

There's a naive version of this, which is that you should seize any good immediate outcome you can, even by doing horrific things. That's... not a healthy version of consequentialism. The way to be less naive is to care about long-term consequences, and also to expect that you can't get away with keeping your behavior secret from others in general. Here's a good account of what non-naive consequentialism can look like.

the most essential questions about policy aren't things like "what is fair" or "what rights do people have", although these are good questions

In particular, fairness and rights are vital to making people's lives better! We want more than just physical comforts; we want autonomy and achievement and meaning, we want to have trustworthy promises about what the world will ask of us tomorrow, and we want past injustices to be rectified. But these can be traded off, in extreme situations, against the other things that are important for people. In a massive emergency, I'd rather save lives in an unfair way and try to patch up the unfairness later, than let people die to preserve fairness.

how do we make people's lives better?

This gets complicated and weird when you apply it to things like our distant descendants, but there are some aspects in the world today that seem fairly straightforward. Our world has built an engine of prosperity that makes food and goods available to many, beyond what was dreamt of in the past. But many people in the world are still living short and painful lives filled with disease and starvation. Another dollar of goods will do much more for one of them than for one of us. If we can improve their lives without destroying that engine, it is imperative to do that. (What consequentialists mostly disagree on is how the engine really works, how it could be destroyed, and how it could be improved!)

Comment by orthonormal on Book summary: Unlocking the Emotional Brain · 2019-10-10T04:49:39.398Z · score: 15 (9 votes) · LW · GW

Well, damn. I started out reading this really skeptically, but the process fit my own past experiences of changing core beliefs frighteningly well. (The only thing that's ever noticeably worked for me is something like showing that one of my IFS parts is a hypocrite, which results in it more or less self-destructing and allowing change to happen.) I just emailed a Coherence Therapy practitioner, since I'm looking for a new therapist; this seems worth a try.

Comment by orthonormal on "Can We Survive Technology" by von Neumann · 2019-08-18T19:02:18.688Z · score: 53 (20 votes) · LW · GW

Quick approximate summary:

  • John von Neumann first says, essentially, that the Industrial Revolution has made the world smaller, and that in earlier centuries the problems caused by it were contained to nations, but that now they extend to the entire globe.
    • For a first example, he talks about cheap energy and industrial synthesis, though he predicts that nuclear fusion and transmutation would be much more practically feasible than they have turned out to be.
    • He briefly mentions expected major improvement in automation, communication, and transportation.
    • He then talks about anthropogenic climate change and the broad possibilities of geoengineering the climate.
    • All of these technologies can vastly improve human life, or destroy it.
  • He immediately rules out the "solution" of [preventing advances in technology] as both undesirable (because it blocks the positive uses) and impossible (because it would require total coordination and a total change in human values).
  • He next considers the possibility of permanently avoiding war through diplomacy etc, and does not think that the 1950s drive for world peace will last long; furthermore, such an initiative would need to adapt to ever-more-powerful technologies as fast as they are introduced.
  • He frames the upcoming decades as a dangerous but useful evolution, where we will either succeed or fail catastrophically, and doesn't sound especially optimistic. Our best hope is to innovate new political forms that are capable of handling major threats with patience, flexibility, and intelligence.
Comment by orthonormal on Keeping Beliefs Cruxy · 2019-07-29T23:18:54.604Z · score: 7 (4 votes) · LW · GW

Alas, double-cruxing is a two-person game; you can't make another person do it, you can only do it with someone who genuinely wants to. For everyone else, the best related trick I know is nonviolent communication.

Comment by orthonormal on The Real Rules Have No Exceptions · 2019-07-23T22:17:16.796Z · score: 19 (12 votes) · LW · GW

Meta: I approve of the practice of arguing against your own post in a comment.

Comment by orthonormal on Simple Rules of Law · 2019-07-16T04:33:44.801Z · score: 2 (1 votes) · LW · GW

How much less do you expect this to happen under the current system?

Comment by orthonormal on The AI Timelines Scam · 2019-07-11T04:59:45.435Z · score: 8 (6 votes) · LW · GW
both historically and now, criticism is often met with counterarguments based on "style" rather than engaging with the technical meat of the criticism

Is there any group of people who reliably don't do this? Is there any indication that AI researchers do this more often than others?

Comment by orthonormal on If physics is many-worlds, does ethics matter? · 2019-07-11T02:29:10.515Z · score: 13 (8 votes) · LW · GW

Eliezer's real answer to this question is discussed in Timeless Control. Basically, choice is still meaningful in many-worlds or any other physically deterministic universe. There are incredibly few Everett branches starting from here where tomorrow I go burn down an orphanage, and this is genuinely caused by the fact that I robustly do not want to do that sort of thing.

If you have altruistic motivation, then the Everett branches starting from here are in fact better (in expectation) than the branches starting from a similar universe with a version of you that has no altruistic motivation. By working to do good, you are in a meaningful sense causing the multiverse to contain a higher proportion of good worlds than it otherwise would.

It really does all add up to normality, even if it feels counterintuitive.

Comment by orthonormal on How I Ended Up Non-Ambitious · 2019-07-08T06:03:44.437Z · score: 4 (2 votes) · LW · GW

Well, this post aged interestingly for those of us who know the author (who ended up working for a high-profile EA organization for some time).

Comment by orthonormal on Causal Reality vs Social Reality · 2019-06-27T22:48:38.979Z · score: 2 (1 votes) · LW · GW


it is not the done the thing

it is not the done thing, perhaps?

Comment by orthonormal on Writing children's picture books · 2019-06-27T19:10:54.865Z · score: 14 (5 votes) · LW · GW

Maybe you can get the best of both worlds by imagining you're writing a children's book, but that your editor is in fact an expert on the subject and you don't want to embarrass yourself in front of them.

Comment by orthonormal on Quotes from Moral Mazes · 2019-06-04T00:55:40.036Z · score: 15 (5 votes) · LW · GW

And Robin Hanson was surprised that no big corporation wanted to implement a real prediction market?

Comment by orthonormal on Egoism In Disguise · 2019-06-04T00:37:18.486Z · score: 21 (5 votes) · LW · GW

This strongly resembles the argument given by Subhan in EY's post Is Morality Preference?, with a side order of Fake Selfishness. You might enjoy reading those posts along with others in their respective sequences. ("Is Morality Preference?" was part of the original metaethics sequence but didn't make the cut for Rationality: AI to Zombies.)

More to the point, the biggest mistake I see here is the one addressed in The Domain of Your Utility Function: yes, my moral preferences are a part of my map rather than the territory, but there's still a damn meaningful difference between egoism (preferences that point only to the part of my map labeled "my future experiences") and my actual moral preferences, which point to many other parts of the map as well.

Comment by orthonormal on Yes Requires the Possibility of No · 2019-05-21T03:02:07.816Z · score: 17 (11 votes) · LW · GW
I am struggling to understand the goal of the post.

The title was helpful to me in that regard. Each of these examples shows an agent who could run an honest process to get evidence on a question, but which prefers one answer so much that they try to stack the deck in that direction, and thereby loses the hoped-for benefits of that process.

Getting an honest Yes requires running the risk of getting a No instead.

Comment by orthonormal on Coherent decisions imply consistent utilities · 2019-05-14T00:01:56.252Z · score: 2 (1 votes) · LW · GW

Formatting request: can the footnote numbers be augmented with links that jump to the footnote text? (I presume this worked in Arbital but broke when it was moved here.)

Comment by orthonormal on Complex Behavior from Simple (Sub)Agents · 2019-05-12T06:06:31.711Z · score: 13 (4 votes) · LW · GW
I had a notion here that I could stochastically introduce a new goal that would minimize total suffering over an agent's life-history. I tried this, and the most stable solution turned out to be thus: introduce an overwhelmingly aversive goal that causes the agent to run far away from all of its other goals screaming.

did you mean: anhedonia

(No, seriously, your paragraph is an apt description of a long bout I had of depression-induced anhedonia; I felt so averse to every action that I ceased to feel wants, and I consistently marked my mood as neutral rather than negative despite being objectively more severely depressed than I was at other times when I put negative numbers in my mood tracker.)

Comment by orthonormal on Dishonest Update Reporting · 2019-05-05T23:41:37.310Z · score: 21 (7 votes) · LW · GW

The ideal thing is to judge Bob as if he were making the same prediction every day until he makes a new one, and log-score all of them when the event is revealed. (That is, if Bob says 75% on January 1st and 60% on February 1st, and then on March 1st the event is revealed to have happened, Bob's score equals 31*log(.25) + 28*log(.4). Then Bob's best strategy is to update his prediction to his actual current estimate as often as possible; past predictions are sunk costs.

The real-world version is remembering to dock people's bad predictions more, the longer they persisted in them. But of course this is hard.

538 did do this with their self-evaluation, which is a good way to try and establish a norm in the domain of model-driven reporting.

Comment by orthonormal on Pecking Order and Flight Leadership · 2019-05-04T14:57:15.433Z · score: 4 (2 votes) · LW · GW

Let's note differences of degree here. Political systems differ massively in how easily decisionmakers can claim large spoils for themselves, and these differences seem to correlate with how pro-social the decisions tend to be. In particular, the dollar amounts of graft being alleged for politicians in liberal democracies are usually small compared to what despots regularly claim without consequence. (Which is not to say that it would be wise to ignore corruption in liberal democracies!)

Comment by orthonormal on [Answer] Why wasn't science invented in China? · 2019-04-26T04:24:39.155Z · score: 30 (12 votes) · LW · GW

I'm not convinced that Europe had more intellectual freedom on average than China, but because of the patchwork of principalities, it certainly had more variation in intellectual freedom than did a China that was at any given time either mostly unified or mostly at war; and all that you need for an intellectual revolution is the existence of a bastion of intellectual freedom somewhere.

Comment by orthonormal on How does OpenAI's language model affect our AI timeline estimates? · 2019-02-15T22:29:03.024Z · score: 17 (6 votes) · LW · GW

It doesn't move much probability mass to the very near term (i.e. 1 year or less), because both this and AlphaStar aren't really doing consequentialist reasoning, they're just able to get a surprising performance with simpler tricks (the very Markovian nature of human writing, a good position evaluation function) given a whole lot of compute.

However, it does shift my probabilities forward in time, in the sense that one new weird trick to do deductive or consequentialist reasoning, plus a lot of compute, might get you there really quickly.

Comment by orthonormal on The Rocket Alignment Problem · 2018-10-05T18:56:26.687Z · score: 53 (21 votes) · LW · GW

I expect that some otherwise convinceable readers are not going to realize that in this fictional world, people haven't discovered Newton's physics or calculus, and those readers are therefore going to miss the analogy of "this is how MIRI would talk about the situation if they didn't already know the fundamental concepts but had reasons for searching in the right direction". (I'm not thinking of readers incapable of handling that counterfactual, but of readers who aren't great at inferring implicit background facts from a written dialogue. Such readers might get very confused at the unexpected turns of the dialogue and quit rather than figure out what they're baffled by.)

I'd suggest adding to the preamble something like "In a weird world where people had figured out workable aeronautics and basic rocket propulsion by trial and error, but hadn't figured out Newton's laws or calculus".

Comment by orthonormal on Fuzzy Boundaries, Real Concepts · 2018-05-07T20:51:32.092Z · score: 9 (2 votes) · LW · GW
I like your definition, though, and want to try to make a better one (and I acknowledge this is not the point of this post).

I think that's a perfectly valid thing to do in the comments here! However, I think your attempt,

My stab at a refinement of "consent" is "respect for another's choices", where "disrespect" is "deliberately(?) doing something to undermine"

is far too vague to be a useful concept.

In most realistic cases, I can give a definite answer to whether A touched B in a way B clearly did not want to be touched. In the case of my honesty definition, it does involve intent and so I can only infer statistically when someone else is being dishonest vs mistaken, but for myself I usually have an answer about whether saying X to person C would be honest or not.

I don't think I could do the same for your definition; "am I respecting their choices" is a tough query to bottom out in basic facts.

Comment by orthonormal on Local Validity as a Key to Sanity and Civilization · 2018-05-06T21:21:42.045Z · score: 15 (5 votes) · LW · GW

My comment was meant to explain what I understood Eliezer to be saying, because I think you had misinterpreted that. The OP is simply saying "don't give weight to arguments that are locally invalid, regardless of what else you like about them". Of course you need to use priors, heuristics, and intuitions in areas where you can't find an argument that carries you from beginning to end. But being able to think "oh, if I move there, then they can take my queen, and I don't see anything else good about that position, so let's not do that then" is a fair bit easier than proving your move optimal.

Comment by orthonormal on Local Validity as a Key to Sanity and Civilization · 2018-04-22T03:14:38.579Z · score: 24 (7 votes) · LW · GW
Relying purely on local validity won't get you very far in playing chess

The equivalent of local validity is just mechanically checking "okay, if I make this move, then they can make that move" for a bunch of cases. Which, first, is a major developmental milestone for kids learning chess. So we only think it "won't get you very far" because all the high-level human play explicitly or implicitly takes it for granted.

And secondly, it's pretty analogous to doing math; proving theorems is based on the ability to check the local validity of each step, but mathematicians aren't just brute-forcing their way to proofs. They have to develop higher-level heuristics, some of which are really hard to express in language, to suggest avenues, and then check local validity once they have a skeleton of some part of the argument. But if mathematicians stopped doing that annoying bit, well, then after a while you'll end up with another crisis of analysis when the brilliant intuitions are missing some tiny ingredient.

Local validity is an incredibly important part of any scientific discipline; the fact that it's not a part of most political discourse is merely a reflection that our society is at about the developmental level of a seven-year-old when it comes to political reasoning.

Comment by orthonormal on Non-Adversarial Goodhart and AI Risks · 2018-04-04T05:16:34.488Z · score: 4 (1 votes) · LW · GW

Broken link on the text "real killing of birds to reduce pests in China has never been tried".

Comment by orthonormal on The Costly Coordination Mechanism of Common Knowledge · 2018-04-04T04:52:11.400Z · score: 4 (1 votes) · LW · GW

Much of this material is covered very similarly in Melting Asphalt, especially the posts Ads Don't Work That Way and Doesn't Matter, Warm Fuzzies.

Comment by orthonormal on LessWrong Diaspora Jargon Survey · 2018-04-04T04:27:43.733Z · score: 14 (3 votes) · LW · GW

If you do future surveys of this sort, I'd like you to ask people for their probabilities rather than just their best guesses. If people are uncertain but decently calibrated, I'd argue there's not much of a problem; if people are confidently wrong, I'd argue there's a real problem.

Comment by orthonormal on The Meaning of Right · 2018-04-04T00:40:03.309Z · score: 10 (2 votes) · LW · GW

This comment got linked a decade later, and so I thought it's worth stating my own thoughts on the question:

We can consider a reference class of CEV-seeking procedures; one (massively-underspecified, but that's not the point) example is "emulate 1000 copies of Paul Christiano living together comfortably and immortally and discussing what the AI should do with the physical universe; once there's a large supermajority in favor of an enactable plan (which can include further such delegated decisions), the AI does that".

I agree that this is going to be chaotic, in the sense that even slightly different elements of this reference class might end up steering the AI to different basins of attraction.

I assert, however, that I'd consider it a pretty good outcome overall if the future of the world were determined by a genuinely random draw from this reference class, honestly instantiated. (Again with the massive underspecification, I know.)

CEV may be underdetermined and many-valued, but that doesn't mean paperclipping is as good an answer as any.

Re: no basins, it would be a bad situation indeed if the vast majority of the reference class never ended up outputting an action plan, instead deferring and delegating forever. I don't have cached thoughts about that.

Comment by orthonormal on April Fools: Announcing: Karma 2.0 · 2018-04-01T20:41:28.136Z · score: 32 (8 votes) · LW · GW

I for one welcome our new typographical overlords.

Comment by orthonormal on A model I use when making plans to reduce AI x-risk · 2018-03-31T22:47:45.460Z · score: 10 (2 votes) · LW · GW

That's a legit thing to be frustrated by, but I think you know the reason why AI safety researchers don't want "we don't see a way to get to a good outcome except for an aligned project to grab a decisive strategic advantage" to filter into public discourse: it pattern-matches too well to "trust us, you need to let us run the universe".

Comment by orthonormal on A model I use when making plans to reduce AI x-risk · 2018-03-31T19:51:25.352Z · score: 10 (2 votes) · LW · GW

To be clear, I am making the claim that, of the people who have made useful advances on Oracle AI safety research (Armstrong counts here; I don't think Yampolskiy does), all of them believe that the goal of having a safe Oracle AI is to achieve a decisive strategic advantage quickly and get to an aligned future. I recognize that this is a hard claim to evaluate (e.g. because this isn't a statement one could put in a Serious Academic Journal Article in the 2010s, it would have to be discussed on their blog or in private correspondence), but if anyone has a clear counterexample, I'd be interested in seeing it.

Comment by orthonormal on Circling · 2018-03-31T19:36:44.804Z · score: 9 (2 votes) · LW · GW

Yes, this. NVC should be treated with a similar sort of parameters to Crocker's Rules, which you can declare for yourself at any time, you can invite people to a conversation where it's known that everyone will be using them, but you cannot hold it against anyone if you invite them to declare Crocker's Rules and they refuse.

Comment by orthonormal on The abruptness of nuclear weapons · 2018-03-31T17:39:35.539Z · score: 10 (2 votes) · LW · GW

There's a lot of Actually Bad things an AI can do just by making electrons move.

Comment by orthonormal on A model I use when making plans to reduce AI x-risk · 2018-03-31T16:35:35.079Z · score: 12 (3 votes) · LW · GW

I'd be interested in a list of well-managed government science and engineering projects if one exists. The Manhattan Project and the Apollo Project both belong on that list (despite both having their flaws- leaks to the USSR from the former, and the Apollo 1 disaster from the latter); what are other examples?