Comment by orthonormal on Yes Requires the Possibility of No · 2019-05-21T03:02:07.816Z · score: 10 (6 votes) · LW · GW
I am struggling to understand the goal of the post.

The title was helpful to me in that regard. Each of these examples shows an agent who could run an honest process to get evidence on a question, but which prefers one answer so much that they try to stack the deck in that direction, and thereby loses the hoped-for benefits of that process.

Getting an honest Yes requires running the risk of getting a No instead.

Comment by orthonormal on Coherent decisions imply consistent utilities · 2019-05-14T00:01:56.252Z · score: 2 (1 votes) · LW · GW

Formatting request: can the footnote numbers be augmented with links that jump to the footnote text? (I presume this worked in Arbital but broke when it was moved here.)

Comment by orthonormal on Complex Behavior from Simple (Sub)Agents · 2019-05-12T06:06:31.711Z · score: 12 (3 votes) · LW · GW
I had a notion here that I could stochastically introduce a new goal that would minimize total suffering over an agent's life-history. I tried this, and the most stable solution turned out to be thus: introduce an overwhelmingly aversive goal that causes the agent to run far away from all of its other goals screaming.

did you mean: anhedonia

(No, seriously, your paragraph is an apt description of a long bout I had of depression-induced anhedonia; I felt so averse to every action that I ceased to feel wants, and I consistently marked my mood as neutral rather than negative despite being objectively more severely depressed than I was at other times when I put negative numbers in my mood tracker.)

Comment by orthonormal on Dishonest Update Reporting · 2019-05-05T23:41:37.310Z · score: 21 (7 votes) · LW · GW

The ideal thing is to judge Bob as if he were making the same prediction every day until he makes a new one, and log-score all of them when the event is revealed. (That is, if Bob says 75% on January 1st and 60% on February 1st, and then on March 1st the event is revealed to have happened, Bob's score equals 31*log(.25) + 28*log(.4). Then Bob's best strategy is to update his prediction to his actual current estimate as often as possible; past predictions are sunk costs.

The real-world version is remembering to dock people's bad predictions more, the longer they persisted in them. But of course this is hard.

538 did do this with their self-evaluation, which is a good way to try and establish a norm in the domain of model-driven reporting.

Comment by orthonormal on Pecking Order and Flight Leadership · 2019-05-04T14:57:15.433Z · score: 4 (2 votes) · LW · GW

Let's note differences of degree here. Political systems differ massively in how easily decisionmakers can claim large spoils for themselves, and these differences seem to correlate with how pro-social the decisions tend to be. In particular, the dollar amounts of graft being alleged for politicians in liberal democracies are usually small compared to what despots regularly claim without consequence. (Which is not to say that it would be wise to ignore corruption in liberal democracies!)

Comment by orthonormal on [Answer] Why wasn't science invented in China? · 2019-04-26T04:24:39.155Z · score: 20 (7 votes) · LW · GW

I'm not convinced that Europe had more intellectual freedom on average than China, but because of the patchwork of principalities, it certainly had more variation in intellectual freedom than did a China that was at any given time either mostly unified or mostly at war; and all that you need for an intellectual revolution is the existence of a bastion of intellectual freedom somewhere.

Comment by orthonormal on How does OpenAI's language model affect our AI timeline estimates? · 2019-02-15T22:29:03.024Z · score: 17 (6 votes) · LW · GW

It doesn't move much probability mass to the very near term (i.e. 1 year or less), because both this and AlphaStar aren't really doing consequentialist reasoning, they're just able to get a surprising performance with simpler tricks (the very Markovian nature of human writing, a good position evaluation function) given a whole lot of compute.

However, it does shift my probabilities forward in time, in the sense that one new weird trick to do deductive or consequentialist reasoning, plus a lot of compute, might get you there really quickly.

Comment by orthonormal on The Rocket Alignment Problem · 2018-10-05T18:56:26.687Z · score: 53 (21 votes) · LW · GW

I expect that some otherwise convinceable readers are not going to realize that in this fictional world, people haven't discovered Newton's physics or calculus, and those readers are therefore going to miss the analogy of "this is how MIRI would talk about the situation if they didn't already know the fundamental concepts but had reasons for searching in the right direction". (I'm not thinking of readers incapable of handling that counterfactual, but of readers who aren't great at inferring implicit background facts from a written dialogue. Such readers might get very confused at the unexpected turns of the dialogue and quit rather than figure out what they're baffled by.)

I'd suggest adding to the preamble something like "In a weird world where people had figured out workable aeronautics and basic rocket propulsion by trial and error, but hadn't figured out Newton's laws or calculus".

Comment by orthonormal on Fuzzy Boundaries, Real Concepts · 2018-05-07T20:51:32.092Z · score: 9 (2 votes) · LW · GW
I like your definition, though, and want to try to make a better one (and I acknowledge this is not the point of this post).

I think that's a perfectly valid thing to do in the comments here! However, I think your attempt,

My stab at a refinement of "consent" is "respect for another's choices", where "disrespect" is "deliberately(?) doing something to undermine"

is far too vague to be a useful concept.

In most realistic cases, I can give a definite answer to whether A touched B in a way B clearly did not want to be touched. In the case of my honesty definition, it does involve intent and so I can only infer statistically when someone else is being dishonest vs mistaken, but for myself I usually have an answer about whether saying X to person C would be honest or not.

I don't think I could do the same for your definition; "am I respecting their choices" is a tough query to bottom out in basic facts.

Fuzzy Boundaries, Real Concepts

2018-05-07T03:39:33.033Z · score: 58 (16 votes)
Comment by orthonormal on Local Validity as a Key to Sanity and Civilization · 2018-05-06T21:21:42.045Z · score: 12 (3 votes) · LW · GW

My comment was meant to explain what I understood Eliezer to be saying, because I think you had misinterpreted that. The OP is simply saying "don't give weight to arguments that are locally invalid, regardless of what else you like about them". Of course you need to use priors, heuristics, and intuitions in areas where you can't find an argument that carries you from beginning to end. But being able to think "oh, if I move there, then they can take my queen, and I don't see anything else good about that position, so let's not do that then" is a fair bit easier than proving your move optimal.

Comment by orthonormal on Local Validity as a Key to Sanity and Civilization · 2018-04-22T03:14:38.579Z · score: 21 (5 votes) · LW · GW
Relying purely on local validity won't get you very far in playing chess

The equivalent of local validity is just mechanically checking "okay, if I make this move, then they can make that move" for a bunch of cases. Which, first, is a major developmental milestone for kids learning chess. So we only think it "won't get you very far" because all the high-level human play explicitly or implicitly takes it for granted.

And secondly, it's pretty analogous to doing math; proving theorems is based on the ability to check the local validity of each step, but mathematicians aren't just brute-forcing their way to proofs. They have to develop higher-level heuristics, some of which are really hard to express in language, to suggest avenues, and then check local validity once they have a skeleton of some part of the argument. But if mathematicians stopped doing that annoying bit, well, then after a while you'll end up with another crisis of analysis when the brilliant intuitions are missing some tiny ingredient.

Local validity is an incredibly important part of any scientific discipline; the fact that it's not a part of most political discourse is merely a reflection that our society is at about the developmental level of a seven-year-old when it comes to political reasoning.

Comment by orthonormal on Non-Adversarial Goodhart and AI Risks · 2018-04-04T05:16:34.488Z · score: 4 (1 votes) · LW · GW

Broken link on the text "real killing of birds to reduce pests in China has never been tried".

Comment by orthonormal on The Costly Coordination Mechanism of Common Knowledge · 2018-04-04T04:52:11.400Z · score: 4 (1 votes) · LW · GW

Much of this material is covered very similarly in Melting Asphalt, especially the posts Ads Don't Work That Way and Doesn't Matter, Warm Fuzzies.

Comment by orthonormal on LessWrong Diaspora Jargon Survey · 2018-04-04T04:27:43.733Z · score: 14 (3 votes) · LW · GW

If you do future surveys of this sort, I'd like you to ask people for their probabilities rather than just their best guesses. If people are uncertain but decently calibrated, I'd argue there's not much of a problem; if people are confidently wrong, I'd argue there's a real problem.

Comment by orthonormal on The Meaning of Right · 2018-04-04T00:40:03.309Z · score: 10 (2 votes) · LW · GW

This comment got linked a decade later, and so I thought it's worth stating my own thoughts on the question:

We can consider a reference class of CEV-seeking procedures; one (massively-underspecified, but that's not the point) example is "emulate 1000 copies of Paul Christiano living together comfortably and immortally and discussing what the AI should do with the physical universe; once there's a large supermajority in favor of an enactable plan (which can include further such delegated decisions), the AI does that".

I agree that this is going to be chaotic, in the sense that even slightly different elements of this reference class might end up steering the AI to different basins of attraction.

I assert, however, that I'd consider it a pretty good outcome overall if the future of the world were determined by a genuinely random draw from this reference class, honestly instantiated. (Again with the massive underspecification, I know.)

CEV may be underdetermined and many-valued, but that doesn't mean paperclipping is as good an answer as any.

Re: no basins, it would be a bad situation indeed if the vast majority of the reference class never ended up outputting an action plan, instead deferring and delegating forever. I don't have cached thoughts about that.

Comment by orthonormal on April Fools: Announcing: Karma 2.0 · 2018-04-01T20:41:28.136Z · score: 32 (8 votes) · LW · GW

I for one welcome our new typographical overlords.

Comment by orthonormal on A model I use when making plans to reduce AI x-risk · 2018-03-31T22:47:45.460Z · score: 10 (2 votes) · LW · GW

That's a legit thing to be frustrated by, but I think you know the reason why AI safety researchers don't want "we don't see a way to get to a good outcome except for an aligned project to grab a decisive strategic advantage" to filter into public discourse: it pattern-matches too well to "trust us, you need to let us run the universe".

Comment by orthonormal on A model I use when making plans to reduce AI x-risk · 2018-03-31T19:51:25.352Z · score: 10 (2 votes) · LW · GW

To be clear, I am making the claim that, of the people who have made useful advances on Oracle AI safety research (Armstrong counts here; I don't think Yampolskiy does), all of them believe that the goal of having a safe Oracle AI is to achieve a decisive strategic advantage quickly and get to an aligned future. I recognize that this is a hard claim to evaluate (e.g. because this isn't a statement one could put in a Serious Academic Journal Article in the 2010s, it would have to be discussed on their blog or in private correspondence), but if anyone has a clear counterexample, I'd be interested in seeing it.

Comment by orthonormal on Circling · 2018-03-31T19:36:44.804Z · score: 9 (2 votes) · LW · GW

Yes, this. NVC should be treated with a similar sort of parameters to Crocker's Rules, which you can declare for yourself at any time, you can invite people to a conversation where it's known that everyone will be using them, but you cannot hold it against anyone if you invite them to declare Crocker's Rules and they refuse.

Comment by orthonormal on The abruptness of nuclear weapons · 2018-03-31T17:39:35.539Z · score: 10 (2 votes) · LW · GW

There's a lot of Actually Bad things an AI can do just by making electrons move.

Comment by orthonormal on A model I use when making plans to reduce AI x-risk · 2018-03-31T16:35:35.079Z · score: 10 (2 votes) · LW · GW

I'd be interested in a list of well-managed government science and engineering projects if one exists. The Manhattan Project and the Apollo Project both belong on that list (despite both having their flaws- leaks to the USSR from the former, and the Apollo 1 disaster from the latter); what are other examples?

Comment by orthonormal on A model I use when making plans to reduce AI x-risk · 2018-03-31T16:32:04.175Z · score: 2 (2 votes) · LW · GW

I'm pretty sure that, without exception, anyone who's made a useful contribution on Oracle AI recognizes that "let several organizations have an Oracle AI for a significant amount of time" is a world-ending failure, and that their work is instead progress on questions like "if you can have the only Oracle AI for six months, can you save the world rather than end it?"

Correct me if I'm wrong.

Comment by orthonormal on Roleplaying As Yourself · 2018-01-07T02:36:12.555Z · score: 4 (1 votes) · LW · GW

Thanks!

Roleplaying As Yourself

2018-01-06T06:48:03.510Z · score: 85 (31 votes)
Comment by orthonormal on The Loudest Alarm Is Probably False · 2018-01-05T02:10:31.451Z · score: 3 (1 votes) · LW · GW

I agree there are broken alarms that are quiet (including those that are broken in the direction of failing to go off, which leads to a blind spot of obliviousness!), and that there are people stuck in situations where there is a correct loud alarm that happens most of the time.

I said that habits are easier to change than alarms, not that they're easy in an absolute sense.

Comment by orthonormal on The Loudest Alarm Is Probably False · 2018-01-04T02:40:48.216Z · score: 17 (5 votes) · LW · GW

It's because the non-broken alarms, which also start out loud, get quieter throughout your life as they calibrate themselves, and as one's habits fix the situations that make them correctly go off. So given a random initial distribution of loudness, eventually the alarm that's loudest on average will probably be a broken one.

The Loudest Alarm Is Probably False

2018-01-02T16:38:05.748Z · score: 156 (58 votes)
Comment by orthonormal on My Predictions for 2018 (& a Template for Yours) · 2018-01-02T15:07:36.375Z · score: 8 (3 votes) · LW · GW

The formatting didn't come through when importing this post from your blog- especially the strikethroughs of failed predictions and the graphs!

Comment by orthonormal on [deleted post] 2017-05-25T22:13:07.522Z

In the spirit of Murphyjitsu, the most obvious failure mode that you didn't mention is that I expect you to burn out dramatically after a few weeks, from exhaustion or the psychological strain of trying to optimize the experiences of N people. The bootcamp phase is not analogous to anything I've heard of you doing sustainably for an extended period of time.

So, do you expect Dragon Army Barracks to work if Eli has to take over for you in Week Four?

Value Learning for Irrational Toy Models

2017-05-15T20:55:05.000Z · score: 0 (0 votes)
Comment by orthonormal on Proposal for an Implementable Toy Model of Informed Oversight · 2017-04-14T20:36:59.000Z · score: 0 (0 votes) · LW · GW

I like this suggestion of a more feasible form of steganography for NNs to figure out! But I think you'd need further advances in transparency to get useful informed oversight capabilities from (transformed or not) copies of the predictive network.

Comment by orthonormal on HCH as a measure of manipulation · 2017-03-14T01:49:40.000Z · score: 0 (0 votes) · LW · GW

I should have said "reliably estimate HCH"; I'd also want quite a lot of precision in addition to calibration before I trust it.

Comment by orthonormal on HCH as a measure of manipulation · 2017-03-13T21:41:41.000Z · score: 0 (0 votes) · LW · GW

Re #2, I think this is an important objection to low-impact-via-regularization-penalty in general.

Comment by orthonormal on HCH as a measure of manipulation · 2017-03-13T21:29:23.000Z · score: 0 (0 votes) · LW · GW

Re #1, an obvious set of questions to include in are questions of approval for various aspects of the AI's policy. (In particular, if we want the AI to later calculate a human's HCH and ask it for guidance, then we would like to be sure that HCH's answer to that question is not manipulated.)

Comment by orthonormal on HCH as a measure of manipulation · 2017-03-13T21:27:33.000Z · score: 0 (0 votes) · LW · GW

There's the additional objection of "if you're doing this, why not just have the AI ask HCH what to do?"

Overall, I'm hoping that it could be easier for an AI to robustly conclude that a certain plan only changes a human's HCH via certain informational content, than for the AI to reliably calculate the human's HCH. But I don't have strong arguments for this intuition.

Comment by orthonormal on All the indifference designs · 2017-03-11T03:28:48.000Z · score: 0 (0 votes) · LW · GW

Question that I haven't seen addressed (and haven't worked out myself): which of these indifference methods are reflectively stable, in the sense that the AI would not push a button to remove them (or switch to a different indifference method)?

HCH as a measure of manipulation

2017-03-11T03:02:53.000Z · score: 0 (0 votes)
Comment by orthonormal on Modal Combat for games other than the prisoner's dilemma · 2017-03-06T23:16:21.000Z · score: 0 (0 votes) · LW · GW

This is a lot of good work! Modal combat is increasingly deprecated though (in my opinion), for reasons like the ones you noted in this post, compared to studying decision theory with logical inductors; and so I'm not sure this is worth developing further.

Comment by orthonormal on Censoring out-of-domain representations · 2017-02-11T01:42:06.000Z · score: 0 (0 votes) · LW · GW

Yup, this isn't robust to extremely capable systems; it's a quantitative shift in how promising it looks to the agent to learn about external affairs, not a qualitative one.

(In the example with the agent doing engineering in a sandbox that doesn't include humans or general computing devices, there could be a strong internal gradient to learn obvious details about the things immediately outside its sandbox, and a weaker gradient for learning more distant or subtle things before you know the nearby obvious ones.)

A whitelisting variant would be way more reliable than a blacklisting one, clearly.

Censoring out-of-domain representations

2017-02-01T04:09:51.000Z · score: 2 (2 votes)
Comment by orthonormal on The Pascal's Wager Fallacy Fallacy · 2017-01-10T23:49:12.184Z · score: 1 (1 votes) · LW · GW

How did this post get attributed to [deleted] instead of to Eliezer? I'm 99% sure this post was by him, and the comments seem to bear it out.

Comment by orthonormal on Suggested solution to The Naturalized Induction Problem · 2016-12-27T19:11:42.781Z · score: 2 (2 votes) · LW · GW

This sweeps some of the essential problems under the rug; if you formalize it a bit more, you'll see them.

It's not an artificial restriction, for instance, that a Solomonoff Induction oracle machine doesn't include things like itself in its own hypothesis class, since the question of "whether a given oracle machine matches the observed data" is a question that sometimes cannot be answered by an oracle machine of equivalent power. (There are bounded versions of this obstacle as well.)

Now, there are some ways around this problem (all of them, so far as I know, found by MIRI): modal agents, reflective oracle machines and logical inductors manage to reason about hypothesis classes that include objects like themselves. Outside of MIRI, people working on multiagent systems make do with agents that each assume the other is smaller/simpler/less meta than itself (so at least one of those agents is going to be wrong).

But this entire problem is hidden in your assertion that the agent, which is a Turing machine, "models the entire wrold, including the agent it self, as one unknown, output only Turing machine". The only way to find the other problems swept under the rug here is to formalize or otherwise unpack your proposal.

Comment by orthonormal on CFAR’s new focus, and AI Safety · 2016-12-03T05:58:39.623Z · score: 12 (12 votes) · LW · GW

If CFAR will be discontinuing/de-emphasizing rationality workshops for the general educated public, then I'd like to see someone else take up that mantle, and I'd hope that CFAR would make it easy for such a startup to build on what they've learned so far.

Comment by orthonormal on (Non-)Interruptibility of Sarsa(λ) and Q-Learning · 2016-11-28T22:24:58.000Z · score: 0 (0 votes) · LW · GW

Nice! One thing that might be useful for context: what's the theoretical correct amount of time that you would expect an algorithm to spend on the right vs. the left if the session gets interrupted each time it goes 1 unit to the right? (I feel like there should be a pretty straightforward way to calculate the heuristic version where the movement is just Brownian motion that gets interrupted early if it hits +1.)

Vector-Valued Reinforcement Learning

2016-11-01T00:21:55.000Z · score: 2 (2 votes)
Comment by orthonormal on Asymptotic Decision Theory · 2016-10-17T22:19:46.000Z · score: 0 (0 votes) · LW · GW

Typo: The statement of Theorem 4.1 omits the word "continuous".

Comment by orthonormal on (C)IRL is not solely a learning process · 2016-09-27T18:34:15.000Z · score: 0 (0 votes) · LW · GW

Stuart did make it easier for many of us to read his recent ideas by crossposting them here. I'd like there to be some central repository for the current set of AI control work, and I'm hoping that the forum could serve as that.

Is there a functionality that, if added here, would make it trivial to crosspost when you wrote something of note?

Comment by orthonormal on (C)IRL is not solely a learning process · 2016-09-16T19:25:00.000Z · score: 1 (1 votes) · LW · GW

The authors of the CIRL paper are in fact aware of them, and are pondering them for future work. I've had fruitful conversations with Dylan Hadfield-Menell (one of the authors), talking about how a naive implementation goes wrong for irrational humans, and about what a tractable non-naive implementation might look like (trying to model probabilities of a human's action under joint hypotheses about the correct reward function and about the human's psychology); he's planning future work relevant to that question.

Also note Dylan's talk on CIRL, value of information, and the shutdown problem, which doesn't solve the problem entirely but which significantly improved my opinion of the usefulness of approaches like CIRL. (The writeup of this result is forthcoming.)

Comment by orthonormal on IRL is hard · 2016-09-16T19:12:17.000Z · score: 0 (0 votes) · LW · GW

I agree strongly with the general principle "we need to be able to prove guarantees about our learning process in environments rich enough to be realistic", but I disagree with the claim that this shows a flaw in IRL. Adversarial environments seem to me very disanalogous to learning complicated and implicit preferences in a non-adversarial environment.

(You and I talked about this a bit, and I pointed out that computational complexity issues only block people in practice when the setup needs to be adversarial, e.g. intentional cryptography to prevent an adversary from reading a message. SAT instances are NP-hard, but in practice SAT solvers are quite useful; similarly, the Traveling Salesman problem is NP-hard, but in practice people only care about approximate solutions and those can be computed effectively.)

Comment by orthonormal on Improbable Oversight, An Attempt at Informed Oversight · 2016-07-28T19:55:12.000Z · score: 0 (0 votes) · LW · GW

If you're confident of getting a memory trace for all books consulted, then there are simpler ways of preventing plagiarism in the informed oversight case: have the overseer read only the books consulted by the agent (or choose randomly among them for the ones to read). The informed oversight problem here assumes that the internals of A are potentially opaque to B, even though B has greater capabilities than A.

Comment by orthonormal on Learning (meta-)preferences · 2016-07-28T19:48:51.000Z · score: 1 (1 votes) · LW · GW

Yup, including better models of human irrationality seems like a promising direction for CIRL. I've been writing up a short note on the subject with more explicit examples- if you want to work on this without duplicating effort, let me know and I'll share the rough draft with you.

Comment by orthonormal on Improbable Oversight, An Attempt at Informed Oversight · 2016-07-23T22:15:17.000Z · score: 0 (0 votes) · LW · GW

Even the last version might have odd incentives. If A knew that the chances were high enough that an actually original A book would be seen as rare plagiarism of some book unknown to A, the dominant strategy could be to instead commit the most obvious plagiarism ever, in order to minimize the penalty that cannot be reliably avoided.

Comment by orthonormal on A new proposal for logical counterfactuals · 2016-07-12T19:15:51.000Z · score: 1 (1 votes) · LW · GW

Can you define more precisely what you mean by "censoring contradictions"?

Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences

2016-06-18T00:55:10.000Z · score: 2 (2 votes)
Comment by orthonormal on Example of double indifference · 2016-05-15T22:28:53.000Z · score: 0 (0 votes) · LW · GW

In the spirit of "one step is normal, two steps are suspicious, omega steps are normal", perhaps there's a 'triple corrigibility' issue when ?

Comment by orthonormal on Double indifference is better indifference · 2016-05-06T01:25:37.000Z · score: 0 (0 votes) · LW · GW

Typo: in the paragraph before the equation arrays, you forgot to change from 5 to 42 (you did so in the following equation arrays). This buffaloed me for a bit!

Comment by orthonormal on Maximizing a quantity while ignoring effect through some channel · 2016-04-30T17:00:27.000Z · score: 1 (1 votes) · LW · GW

It's illustrating the failure of a further desideratum for the shutdown problem: we would like the AI to be able to update on and react to things that happen in the world which correlate with a certain channel, and yet still not attempt to influence that channel.

For motivation, assume a variant on the paperclip game:

  • the humans can be observed reaching for the button several turns before it is pressed
  • the humans' decision to press the button is a stochastic function of environmental variables (like seeing that the AI has unexpectedly been hit by lightning, or has started producing Too Many Paperclips, etc)

We would like a solution which in some sense updates on the precursors to shutdown and minimizes the damage while still not attempting to influence the button press. (If doing such a thing robustly is impossible, we would like to discover this; Jessica mentioned that there is a version which does this but is not reflectively consistent.)

Intuitively, I could imagine a well-constructed AI reasoning "oh, they're showing signs that they're going to shut me down, guess my goal is wrong, I'll initiate Safe Shutdown Protocol now rather than risk doing further damage", but current formalizations don't do this.

Proof Length and Logical Counterfactuals Revisited

2016-02-10T18:56:38.000Z · score: 3 (3 votes)

Obstacle to modal optimality when you're being modalized

2015-08-29T20:41:59.000Z · score: 3 (3 votes)

A simple model of the Löbstacle

2015-06-11T16:23:22.000Z · score: 2 (2 votes)

Agent Simulates Predictor using Second-Level Oracles

2015-06-06T22:08:37.000Z · score: 2 (2 votes)

Agents that can predict their Newcomb predictor

2015-05-19T10:17:08.000Z · score: 1 (1 votes)

Modal Bargaining Agents

2015-04-16T22:19:03.000Z · score: 3 (3 votes)

[Clearing out my Drafts folder] Rationality and Decision Theory Curriculum Idea

2015-03-23T22:54:51.241Z · score: 6 (7 votes)

An Introduction to Löb's Theorem in MIRI Research

2015-03-23T22:22:26.908Z · score: 16 (17 votes)

Welcome, new contributors!

2015-03-23T21:53:20.000Z · score: 4 (4 votes)

A toy model of a corrigibility problem

2015-03-22T19:33:02.000Z · score: 4 (4 votes)

New forum for MIRI research: Intelligent Agent Foundations Forum

2015-03-20T00:35:07.071Z · score: 36 (37 votes)

Forum Digest: Updateless Decision Theory

2015-03-20T00:22:06.000Z · score: 5 (5 votes)

Meta- the goals of this forum

2015-03-10T20:16:47.000Z · score: 3 (3 votes)

Proposal: Modeling goal stability in machine learning

2015-03-03T01:31:36.000Z · score: 1 (1 votes)

An Introduction to Löb's Theorem in MIRI Research

2015-01-22T20:35:50.000Z · score: 2 (2 votes)

Robust Cooperation in the Prisoner's Dilemma

2013-06-07T08:30:25.557Z · score: 73 (71 votes)

Compromise: Send Meta Discussions to the Unofficial LessWrong Subreddit

2013-04-23T01:37:31.762Z · score: -2 (18 votes)

Welcome to Less Wrong! (5th thread, March 2013)

2013-04-01T16:19:17.933Z · score: 27 (28 votes)

Robin Hanson's Cryonics Hour

2013-03-29T17:20:23.897Z · score: 29 (34 votes)

Does My Vote Matter?

2012-11-05T01:23:52.009Z · score: 19 (37 votes)

Decision Theories, Part 3.75: Hang On, I Think This Works After All

2012-09-06T16:23:37.670Z · score: 23 (24 votes)

Decision Theories, Part 3.5: Halt, Melt and Catch Fire

2012-08-26T22:40:20.388Z · score: 31 (32 votes)

Posts I'd Like To Write (Includes Poll)

2012-05-26T21:25:31.019Z · score: 14 (15 votes)

Timeless physics breaks T-Rex's mind [LINK]

2012-04-23T19:16:07.064Z · score: 22 (29 votes)

Decision Theories: A Semi-Formal Analysis, Part III

2012-04-14T19:34:38.716Z · score: 23 (28 votes)

Decision Theories: A Semi-Formal Analysis, Part II

2012-04-06T18:59:35.787Z · score: 16 (19 votes)

Decision Theories: A Semi-Formal Analysis, Part I

2012-03-24T16:01:33.295Z · score: 23 (25 votes)

Suggestions for naming a class of decision theories

2012-03-17T17:22:54.160Z · score: 5 (8 votes)

Decision Theories: A Less Wrong Primer

2012-03-13T23:31:51.795Z · score: 72 (76 votes)

Baconmas: The holiday for the sciences

2012-01-05T18:51:10.606Z · score: 5 (5 votes)

Advice Request: Baconmas Website

2012-01-01T19:25:40.308Z · score: 11 (11 votes)

[LINK] "Prediction Audits" for Nate Silver, Dave Weigel

2011-12-30T21:07:50.916Z · score: 12 (13 votes)

Welcome to Less Wrong! (2012)

2011-12-26T22:57:21.157Z · score: 26 (27 votes)

Improving My Writing Style

2011-10-11T16:14:40.907Z · score: 6 (9 votes)

Decision Theory Paradox: Answer Key

2011-09-05T23:13:33.256Z · score: 6 (6 votes)

Consequentialism Need Not Be Nearsighted

2011-09-02T07:37:08.154Z · score: 55 (55 votes)

Decision Theory Paradox: PD with Three Implies Chaos?

2011-08-27T19:22:15.046Z · score: 19 (29 votes)

Why are certain trends so precisely exponential?

2011-08-06T17:38:42.140Z · score: 16 (17 votes)

Nature: Red, in Truth and Qualia

2011-05-29T23:50:28.495Z · score: 44 (38 votes)

A Study of Scarlet: The Conscious Mental Graph

2011-05-27T20:13:26.876Z · score: 29 (34 votes)

Seeing Red: Dissolving Mary's Room and Qualia

2011-05-26T17:47:55.751Z · score: 39 (43 votes)

Perspectivism and the Real World

2011-01-10T23:33:23.077Z · score: 1 (6 votes)