Comment by scott-garrabrant on How the MtG Color Wheel Explains AI Safety · 2019-02-17T02:07:14.332Z · score: 14 (4 votes) · LW · GW

I think informed oversight fits better with MtG white than it does with boxing. I agree that the three main examples are boxing like, and informed oversight is not, but it still feels white to me.

I do think that corrigibility done right is a thing that is in some sense less agentic. I think that things that have goals outside of them are less agentic than things that have their goals inside of them, but I think corrigibility is stronger than that. I want to say something like a corrigible agent not only has its goals partially on the outside (in the human), but also partially has its decision theory on the outside. Idk.

How the MtG Color Wheel Explains AI Safety

2019-02-15T23:42:59.637Z · score: 51 (17 votes)
Comment by scott-garrabrant on How does Gradient Descent Interact with Goodhart? · 2019-02-04T03:44:54.558Z · score: 4 (2 votes) · LW · GW

Fixed, thanks.

How does Gradient Descent Interact with Goodhart?

2019-02-02T00:14:51.673Z · score: 65 (17 votes)
Comment by scott-garrabrant on Announcement: AI alignment prize round 3 winners and next round · 2018-12-20T03:43:49.952Z · score: 12 (3 votes) · LW · GW

Abram and I submit Embedded Agency.

Formal Open Problem in Decision Theory

2018-11-29T03:25:46.134Z · score: 29 (15 votes)

The Ubiquitous Converse Lawvere Problem

2018-11-29T03:16:16.453Z · score: 18 (8 votes)

Hyperreal Brouwer

2018-11-29T03:15:23.650Z · score: 24 (8 votes)

Fixed Point Discussion

2018-11-24T20:53:39.545Z · score: 35 (7 votes)
Comment by scott-garrabrant on Diagonalization Fixed Point Exercises · 2018-11-22T16:06:42.235Z · score: 6 (3 votes) · LW · GW

Yeah, it is just functions that take in two sentences and put both their Godel numbers into a fixed formula (with 2 inputs).

Comment by scott-garrabrant on Iteration Fixed Point Exercises · 2018-11-22T16:02:32.661Z · score: 6 (3 votes) · LW · GW

Thanks, I actually wanted to get rid of the earlier condition that for all , and I did that.

Iteration Fixed Point Exercises

2018-11-22T00:35:09.885Z · score: 31 (9 votes)

Diagonalization Fixed Point Exercises

2018-11-18T00:31:19.683Z · score: 37 (10 votes)

Topological Fixed Point Exercises

2018-11-17T01:40:06.342Z · score: 69 (26 votes)

Fixed Point Exercises

2018-11-17T01:39:50.233Z · score: 49 (19 votes)

Embedded Agency (full-text version)

2018-11-15T19:49:29.455Z · score: 79 (27 votes)

Embedded Curiosities

2018-11-08T14:19:32.546Z · score: 74 (25 votes)
Comment by scott-garrabrant on Embedded Agents · 2018-11-07T19:45:34.084Z · score: 39 (8 votes) · LW · GW

This is not a complete answer, but it is part of my picture:

(It is the part of the picture that I can give while being only descriptive, and not prescriptive. For epistemic hygiene reasons, I want avoid discussions of how much of different approaches we need in contexts (like this one) that would make me feel like I was justifying my research in a way that people might interpret as an official statement from the agent foundations team lead.)

I think that Embedded Agency is basically a refactoring of Agent Foundations in a way that gives one central curiosity based goalpost, rather than making it look like a bunch of independent problems. It is mostly all the same problems, but it was previously packaged as "Here are a bunch of things we wish we understood about aligning AI," and in repackaged as "Here is a central mystery of the universe, and here are a bunch things we don't understand about it." It is not a coincidence that they are the same problems, since they were generated in the first place by people paying close to what mysteries of the universe related to AI we haven't solved yet.

I think of Agent Foundations research has having a different type signature than most other AI Alignment research, in a way that looks kind of like Agent Foundations:other AI alignment::science:engineering. I think of AF as more forward-chaining and other stuff as more backward-chaining. This may seem backwards if you think about AF as reasoning about superintelligent agents, and other research programs as thinking about modern ML systems, but I think it is true. We are trying to build up a mountain of understanding, until we collect enough that the problem seems easier. Others are trying to make direct plans on what we need to do, see what is wrong with those plans, and try to fix the problems. Some consequences of this is that AF work is more likely to be helpful given long timelines, partially because AF is trying to be the start of a long journey of figuring things out, but also because AF is more likely to be robust to huge shifts in the field.

I actually like to draw an analogy with this: (taken from this post by Evan Hubinger)

I was talking with Scott Garrabrant late one night recently and he gave me the following problem: how do you get a fixed number of DFA-based robots to traverse an arbitrary maze (if the robots can locally communicate with each other)? My approach to this problem was to come up with and then try to falsify various possible solutions. I started with a hypothesis, threw it against counterexamples, fixed it to resolve the counterexamples, and iterated. If I could find a hypothesis which I could prove was unfalsifiable, then I'd be done.
When Scott noticed I was using this approach, he remarked on how different it was than what he was used to when doing math. Scott's approach, instead, was to just start proving all of the things he could about the system until he managed to prove that he had a solution. Thus, while I was working backwards by coming up with possible solutions, Scott was working forwards by expanding the scope of what he knew until he found the solution.

(I don't think it quite communicates my approach correctly, but I don't know how to do better.)

A consequence of the type signature of Agent Foundations is that my answer to "What are the other major chunks of the larger problem?" is "That is what I am trying to figure out."

Comment by scott-garrabrant on Subsystem Alignment · 2018-11-07T18:52:53.598Z · score: 13 (8 votes) · LW · GW

So if we view an epistemic subsystem as an super intelligent agent who has control over the map and has the goal of make the map match the territory, one extreme failure mode is that it takes a hit to short term accuracy by slightly modifying the map in such a way as to trick the things looking at the map into giving the epistemic subsystem more control. Then, once it has more control, it can use it to manipulate the territory to make the territory more predictable. If your goal is to minimize surprise, you should destroy all the surprising things.

Note that we would not make an epistemic system this way, a more realistic model of the goal of an epistemic system we would build is "make the map match the territory better than any other map in a given class," or even "make the map match the territory better than any small modification to the map." But a large point of the section is that if you search strategies that "make the map match the territory better than any other map in a given class," at small scales, this is the same as "make the map match the territory." So you might find "make the map match the territory" optimizers, and then go wrong in the way above.

I think all this is pretty unrealistic, and I expect you are much more likely to go off in a random direction than something that looks like a specific subsystem the programmers put in gets too much power and optimizes stabile for what the programmers said. We would need to understand a lot more before we would even hit the failure mode of making a system where the epistemic subsystem was agenticly optimizing what it was supposed to be optimizing.

Comment by scott-garrabrant on Robust Delegation · 2018-11-05T19:00:14.227Z · score: 22 (6 votes) · LW · GW

Some last minute emphasis:

We kind of open with how agents have to grow and learn and be stable, but talk most of the time about this two agent problem, where there is an initial agent and a successor agent. When thinking about it as the succession problem, it seems like a bit of a stretch as a fundamental part of agency. The first two sections were about how agents have to make decisions and have models, and choosing a successor does not seem like as much of a fundamental part of agency. However, when you think it as an agent has to stably continue to optimize over time, it seems a lot more fundamental.

So, I want to emphasize that when we say there are multiple forms of the problem, like choosing successors or learning/growing over time, the view in which these are different at all is a dualistic view. To an embedded agent, the future self is not privileged, it is just another part of the environment, so there is no difference between making a successor and preserving your own goals.

It feels very different to humans. This is because it is much easier for us to change ourselves over time that it is to make a clone of ourselves and change the clone, but that difference is not fundamental.

Comment by scott-garrabrant on Decision Theory · 2018-11-02T16:53:31.594Z · score: 4 (2 votes) · LW · GW

But how do you avoid proving with certainty that p=1/2?

Since your proposal does not say what to do if you find inconsistent proofs that the linear function is two different things, I will assume that if it finds multiple different proofs, it defaults to 5 for the following.

Here is another example:

You are in a 5 and 10 problem. You have twin that is also in a 5 and 10 problem. You have exactly the same source code. There is a consistency checker, and if you and your twin do different things, you both get 0 utility.

You can prove that you and your twin do the same thing. Thus you can prove that the function is 5+5p. You can also prove that your twin takes 5 by Lob's theorem. (You can also prove that you take 5 by Lob's theorem, but you ignore that proof, since "there is always a chance") Thus, you can prove that the function is 5-5p. Your system doesn't know what to do with two functions, so it defaults to 5. (If it is provable that you both take 5, you both take 5, completing the proof by Lob.)

I am doing the same thing as before, but because I put it outside of the agent, it does not get flagged with the "there is always a chance" module. This is trying to illustrate that your proposal takes advantage of a separation between the agent and the environment that was snuck in, and could be done incorrectly.

Two possible fixes:

1) You could say that the agent, instead of taking 5 when finding inconsistency takes some action that exhibits the inconsistency (something that the two functions give different values). This is very similar to the chicken rule, and if you add something like this, you don't really need the rest of your system. If you take an agent that whenever it proves it does something, it does something else. This agent will prove (given enough time) that if it takes 5 it gets 5, and if it takes 10 it gets 10.

2) I had one proof system, and just ignored the proofs that I found that I did a thing. I could instead give the agent a special proof system that is incapable of proving what it does, but how do you do that? Chicken rule seems like the place to start.

One problem with the chicken rule is that it was developed in a system that was deductively closed, so you can't prove something that passes though a proof of P without proving P. If you violate this, by having a random theorem prover, you might have an system that fails to prove "I take 5" but proves "I take 5 and 1+1=2" and uses this to complete the Lob loop.

Comment by scott-garrabrant on Decision Theory · 2018-11-02T03:23:17.816Z · score: 6 (3 votes) · LW · GW

Sure. How do you do that?

Comment by scott-garrabrant on Decision Theory · 2018-11-01T23:53:25.250Z · score: 8 (4 votes) · LW · GW

My point was that I don't know where to assume the linearity is. Whenever I have private randomness, I have linearity over what I end up choosing with that randomness, but not linearity over what probability I choose. But I think this is non getting at the disagreement, so I pivot to:

In your model, what does it mean to prove that U is some linear affine function? If I prove that my probability p is 1/2 and that U=7.5, have I proven that U is the constant function 7.5? If there is only one value of p, it is not defined what the utility function is, unless I successfully carve the universe in such a way as to let me replace the action with various things and see what happens. (or, assuming linearity replace the probability with enough linearly independent things (in this case 2) to define the function.

Comment by scott-garrabrant on Decision Theory · 2018-11-01T21:43:56.035Z · score: 11 (6 votes) · LW · GW

Yeah, so its like you have this private data, which is an infinite sequence of bits, and if you see all 0's you take an exploration action. I think that by giving the agent these private bits and promising that the bits do not change the rest of the world, you are essentially giving the agent access to a causal counterfactual that you constructed. You don't even have to mix with what the agent actually does, you can explore with every action and ask if it is better to explore and take 5 or explore and take 10. By doing this, you are essentially giving the agent access to a causal counterfactual, because conditioning on these infinitesimals is basically like coming in and changing what the agent does. I think giving the agent a true source of randomness actually does let you implement CDT.

If the environment learns from the other possible worlds, It might punish or reward you in one world for stuff that you do in the other world, so you cant just ask which world is best to figure out what to do.

I agree that that is how you want to think about the matching pennies problem. However the point is that your proposed solution assumed linearity. It didn't empirically observe linearity. You have to be able to tell the difference between the situations in order to know not to assume linearity in the matching pennies problem. The method for telling the difference is how you determine whether or not and in what ways you have logical control over Omega's prediction of you.

Comment by scott-garrabrant on What is ambitious value learning? · 2018-11-01T19:15:06.616Z · score: 15 (6 votes) · LW · GW

A conversation that just went down in my head:

Me: "You observe a that a bunch of attempts to write down what we want get Goodharted, and so you suggest writing down what we want using data. This seems like it will have all the same problems."

Straw You: "The reason you fail is because you can't specify what we really want, because value is complex. Trying to write down human values is qualitatively different from trying to write down human values using a pointer to all the data that happened in the past. That pointer cheats the argument from complexity, since it lets us fit lots of data into a simple instruction."

Me: "But the instruction is not simple! Pointing at what the "human" is is hard. Dealing with the fact that the human in inconsistent with itself gives more degrees of freedom. If you just look at the human actions, and don't look inside the brain, there are many many goals consistent with the actions you see. If you do look inside the brain, you need to know how to interpret that data. None of these are objective facts about the universe that you can just learn. You have to specify them, or specify a way to specify them, and when you do that, you do it wrong and you get Goodharted."

Comment by scott-garrabrant on Decision Theory · 2018-11-01T18:26:10.788Z · score: 16 (6 votes) · LW · GW

So, your suggestion is not just an inconsequential grain of uncertainty, it is an grain of exploration. The agent actually does take 10 with some small probability. If you try to do this with just uncertainty, things would be worse, since that uncertainty would not be justified.

One problem is that you actually do explore a bunch, and since you don't get a reset button, you will sometimes explore into irreversible actions, like shutting yourself off. However, if the agent has a source of randomness, and also the ability to simulate worlds in which that randomness went another way, you can have an agent that with probability does not explore ever, and learns from the other worlds in which it does explore. So, you can either explore forever, and shut yourself off, or you can explore very very rarely and learn from other possible worlds.

The problem with learning from other possible worlds is to get good results out of it, you have to assume that the environment does not also learn from other possible worlds, which is not very embedded.

But you are suggesting actually exploring a bunch, and there is a problem other than just shutting yourself off. You are getting past this problem in this case by only allowing linear functions, but that is not an accurate assumption. Let's say you are playing matching pennies with Omega, who has the ability to predict what probability you will pick but not what action you will pick.

(In matching pennies, you each choose H or T, you win if they match, they win if they don't.)

Omega will pick H if your probability of H is less that 1/2 and T otherwise. Your utility as a function of probability is piecewise linear with two parts. Trying to assume that it will be linear will make things messy.

There is this problem where sometimes the outcome of exploring into taking 10, and the outcome of actually taking 10 because it is good are different. More on this here.

Comment by scott-garrabrant on Preface to the sequence on value learning · 2018-11-01T00:00:44.544Z · score: 6 (4 votes) · LW · GW

I don't think this is relevant, but there are theoretical uses for maximizing expected log probability, and maximizing expected log probability is not the same as maximizing expected probability, since they interact with the expectation differently.

(A -> B) -> A

2018-09-11T22:38:19.866Z · score: 42 (18 votes)
Comment by scott-garrabrant on History of the Development of Logical Induction · 2018-08-29T08:40:47.790Z · score: 3 (2 votes) · LW · GW

fixed

History of the Development of Logical Induction

2018-08-29T03:15:51.889Z · score: 93 (31 votes)
Comment by scott-garrabrant on Bayesian Probability is for things that are Space-like Separated from You · 2018-08-02T22:00:23.389Z · score: 10 (2 votes) · LW · GW

I think you are correct that I cannot cleanly separate the things that are in my past that I know and the things that are in my post that I do not know. For example, if a probability is chosen uniformly at random in the unit interval, then a coin with that probability is flipped a large number of times, then I see some of the results, I do not know the true probability, but the coin flips that I see really should come after the thing that determines the probability in my Bayes' net.

Comment by scott-garrabrant on Probability is Real, and Value is Complex · 2018-07-20T22:00:10.074Z · score: 6 (3 votes) · LW · GW

The uniqueness of 0 is only roughly equivalent to the half plane definition if you also assume convexity (I.e. the existence of independent coins of no value.)

Comment by scott-garrabrant on Optimization Amplifies · 2018-07-11T15:54:09.711Z · score: 2 (1 votes) · LW · GW

I added the word unit.

Bayesian Probability is for things that are Space-like Separated from You

2018-07-10T23:47:49.130Z · score: 77 (34 votes)
Comment by scott-garrabrant on The Alignment Newsletter #1: 04/09/18 · 2018-06-28T21:41:06.755Z · score: 7 (1 votes) · LW · GW

I think these titles should have dates instead of or in addition to numbers for historical context.

Comment by scott-garrabrant on Optimization Amplifies · 2018-06-27T02:32:48.368Z · score: 18 (6 votes) · LW · GW

I think this is similar to Security Mindset, so you might want to think about this post in relation to that.

Comment by scott-garrabrant on Announcement: AI alignment prize round 2 winners and next round · 2018-06-27T02:08:35.622Z · score: 17 (3 votes) · LW · GW

Ok, I have two other things to submit:

Counterfactual Mugging Poker Game and Optimization Amplifies.

I hope that your decision procedure includes a part where if I win, you choose whichever subset of my posts you most want to draw attention to. I think that a single post would get a larger signal boost than each post in a group of three, and would not be offended of one or two of my posts gets cut from the announcement post to increase the signal for other things.

Optimization Amplifies

2018-06-27T01:51:18.283Z · score: 97 (35 votes)
Comment by scott-garrabrant on Prisoners' Dilemma with Costs to Modeling · 2018-06-27T00:47:21.925Z · score: 7 (1 votes) · LW · GW

No, sorry. It wouldn't be very readable, and it is easy to do yourself.

Comment by scott-garrabrant on Prisoners' Dilemma with Costs to Modeling · 2018-06-21T22:11:37.564Z · score: 36 (11 votes) · LW · GW

I am actually worried that because I posted it, people will think it is more relevant to AI safety than it really is. I think it is a little related, but not strongly.

I do think it is surprising and interesting. I think it is useful for thinking about civilization and civilizational collapse and what aliens (or maybe AI or optimization daemons) might look like. My inner Andrew Critch also thinks it is more directly related to AI safety than I do. Also if I thought multipolar scenarios were more likely, I might think it is more relevant.

Also it is made out of pieces such that thinking about it was a useful exercise. I am thinking a lot about Nash equilibria and dynamics. I think the fact that Nash equilibria are not exactly a dynamic type of object and are not easy to find is very relevant to understanding embedded agency. Also, I think that modal combat is relevant, because I think that Lobian handshakes are pointing at an important part of reasoning about oneself.

I think it is relevant enough that it was worth doing, and such that I would be happy if someone expanded on it, but I am not planning on thinking about it much more because it does feel only tangentially related.

That being said, many times I have explicitly thought that I was thinking about a thing that was not really related to the bigger problems I wanted to be working on, only to later see a stronger connection.

Comment by scott-garrabrant on Prisoners' Dilemma with Costs to Modeling · 2018-06-14T19:16:42.995Z · score: 9 (2 votes) · LW · GW

That was wrong. Fixed it. Thanks.

Comment by scott-garrabrant on On the Chatham House Rule · 2018-06-14T17:40:42.250Z · score: 35 (11 votes) · LW · GW

I think the comments here point out just how much we do not have common knowledge about this thing that we are pretending we have common knowledge about.

Comment by scott-garrabrant on On the Chatham House Rule · 2018-06-14T02:56:21.748Z · score: 18 (4 votes) · LW · GW

The FLI Beneficial AI workshop and the CHAI annual workshops have both been under the Chatham House Rule, for example. I don't know about outside of AI safety.

Comment by scott-garrabrant on On the Chatham House Rule · 2018-06-13T23:51:40.521Z · score: 23 (8 votes) · LW · GW

I agree that most people do not expect the rules to be treated as sacred. I still want the rules to be such that someone could (without great cost) treat them as sacred if they wanted to.

That or it should be explicitly stated that you are only expected to loosely follow the spirit of the rule.

Counterfactual Mugging Poker Game

2018-06-13T23:34:59.360Z · score: 79 (30 votes)

On the Chatham House Rule

2018-06-13T21:41:05.057Z · score: 50 (27 votes)
Comment by scott-garrabrant on Prisoners' Dilemma with Costs to Modeling · 2018-06-11T17:49:01.185Z · score: 10 (3 votes) · LW · GW

There is this: https://github.com/machine-intelligence/provability

Comment by scott-garrabrant on Announcement: AI alignment prize round 2 winners and next round · 2018-06-05T18:28:23.839Z · score: 25 (5 votes) · LW · GW

I might post something else later this month, but if not, my submission is my new Prisoners' Dilemma thing.

Prisoners' Dilemma with Costs to Modeling

2018-06-05T04:51:30.700Z · score: 163 (60 votes)
Comment by scott-garrabrant on LW Update 2018-05-11 – Suggest Curation  · 2018-05-12T06:21:45.200Z · score: 23 (4 votes) · LW · GW

I am sad that the karma needed to suggest curation is exactly the same as to moderate. I want more goalposts, not fewer

Comment by scott-garrabrant on Editor Mini-Guide · 2018-05-10T22:18:17.954Z · score: 5 (1 votes) · LW · GW

Mine was in the text editor. Even in the text editor, Cmd 4 sends me to my 4th tab in the window, instead of entering latex.

Comment by scott-garrabrant on Looking for AI Safety Experts to Provide High Level Guidance for RAISE · 2018-05-10T18:30:58.119Z · score: 19 (4 votes) · LW · GW

I think that we should schedule a video chat. I might have a lot of content for you. Email me?

Comment by scott-garrabrant on Editor Mini-Guide · 2018-05-10T18:29:11.540Z · score: 7 (2 votes) · LW · GW

I dont know but Ctrl-Cmd-4 did work

Comment by scott-garrabrant on Editor Mini-Guide · 2018-05-09T21:18:11.768Z · score: 11 (2 votes) · LW · GW

Command 4 does not work on safari....:(

Comment by scott-garrabrant on Knowledge is Freedom · 2018-04-17T22:19:29.867Z · score: 7 (2 votes) · LW · GW

Meta: The word count is very off on this post. I currently see it as 73K. I am not sure what happened, but I believe:

I made a small edit.

I pressed submit.

It appeared that nothing happened.

I pressed submit a bunch of times.

It still appeared that nothing happened.

I went back, and looked at the post.

The edit was made, but the word count became huge.

Comment by scott-garrabrant on Knowledge is Freedom · 2018-04-17T22:12:57.971Z · score: 7 (2 votes) · LW · GW

Fixed, thanks.

Comment by scott-garrabrant on Announcement: AI alignment prize round 2 winners and next round · 2018-04-17T01:41:05.795Z · score: 44 (9 votes) · LW · GW

I think you want to reward output rather than output that would not have otherwise happened.

This is similar to the fact that if you want to train calibration, you have to optimize you log score and just observe your lack of calibration as an opportunity to increase your log score.

Comment by scott-garrabrant on Announcement: AI alignment prize round 2 winners and next round · 2018-04-16T17:56:24.826Z · score: 37 (8 votes) · LW · GW

Maybe in the form of a LW sequence.

Comment by scott-garrabrant on Announcement: AI alignment prize round 2 winners and next round · 2018-04-16T03:45:35.466Z · score: 57 (14 votes) · LW · GW

It seems like maybe there should be an archive page for past rounds.

The Chromatic Number of the Plane is at Least 5 - Aubrey de Grey

2018-04-11T18:19:50.419Z · score: 121 (37 votes)
Comment by scott-garrabrant on Announcement: AI alignment prize winners and next round · 2018-03-28T20:41:33.008Z · score: 16 (3 votes) · LW · GW

I just noticed that the first two posts were curated, and the second two were not, so maybe the only anti-correlation is between me and the Sunshine Regiment, but IIRC, most of the karma was pre-curration, and I posted Robustness to Scale and No Catastrophes at about the same time and was surprised to see a gap in the karma. (I would have predicted the other direction.)

Comment by scott-garrabrant on Announcement: AI alignment prize winners and next round · 2018-03-28T20:35:41.335Z · score: 22 (4 votes) · LW · GW

Also, why is my opinion anti-correlated with Karma?

Maybe, it is a selection effect where I post stuff that is either good content or a good explanation.

Or maybe important insights have a larger inferential gap.

Or maybe I like new insights and the old insights are better because they survived across time, but they are old to me so I don't find them as exciting.

Or maybe it is noise.

Comment by scott-garrabrant on Announcement: AI alignment prize winners and next round · 2018-03-28T20:27:01.869Z · score: 22 (4 votes) · LW · GW

Yes.

I was waiting until the last minute to see if I would have a clear winner on what to submit. Unfortunately, I do not, since there are four posts on the Pareto frontier of karma and how much I think they have an important insight. In decreasing order of karma and increasing order of my opinion:

Robustness to Scale

Sources of Intuitions and Data on AGI

Don't Condition on no Catastrophes

Knowledge is Freedom

Can I have you/other judges decide which post/subset of posts you think is best/want to put more signal towards, and consider that my entry?

New Paper Expanding on the Goodhart Taxonomy

2018-03-14T09:01:59.735Z · score: 50 (12 votes)
Comment by scott-garrabrant on Announcement: AI alignment prize winners and next round · 2018-03-09T09:03:58.191Z · score: 33 (7 votes) · LW · GW

Not only would Goodhart Taxonomy probably not have been finished otherwise (it was 20 percent written in my drafts folder for months), but I think writing that jump started me writing publicly and caused the other posts Ive written since.

Comment by scott-garrabrant on Using the universal prior for logical uncertainty (retracted) · 2018-03-03T23:33:27.340Z · score: 14 (3 votes) · LW · GW

Not really.

You can generalize LI to arbitrary collections of hypotheses, and interpret it as being about bit sequences rather logic, but not much more than that.

The reason the LI paper talks about the LI criterion rather than a specific algorithm is to push in that direction, but it is not as clean as your example.

Is there a Connection Between Greatness in Math and Philosophy?

2018-03-03T23:25:51.206Z · score: 40 (10 votes)
Comment by scott-garrabrant on Using the universal prior for logical uncertainty (retracted) · 2018-03-03T22:34:05.331Z · score: 18 (4 votes) · LW · GW

It is a complaint against Bayes, but it is only a complaint against using Bayes in cases where the real world is probability 0 in your prior.

Part of the point of logical induction is that logic is complicated and no hypothesis in the logical induction algorithm can actually predict it correctly in full, but the algorithm allows for the hypotheses to prove themselves on a sub-pattern, and have the ensemble converge to the correct behavior on that sub-pattern.

Comment by scott-garrabrant on Using the universal prior for logical uncertainty (retracted) · 2018-03-03T20:24:15.453Z · score: 18 (4 votes) · LW · GW

Further, If Solomonoff Induction does get these problems right, it does so because closure properties on the class of hypotheses, not because of properties of the way in which the hypotheses are combined.

In the Logical Induction framework, if you add a bunch of other uncomputable hypotheses, you will still get the good properties on the predictable sub-patterns of the environment.

If you start with the Solomonoff Induction framework, this is demonstrably not true: If I have an environment which is 1 on even bits and uncomputable on odd bits, I can add an uncomputable hypothesis that knows all the odd bits. It can gain trust over time, then spend that trust to say that the next even bit has a 90% chance to be a 0. It will take a hit from this prediction, but can earn back the trust from the odd bits and repeat.

Comment by scott-garrabrant on Mapping the Archipelago · 2018-02-26T19:47:27.151Z · score: 43 (9 votes) · LW · GW

Note: It seems easy to conflate theory with epistemic and practice with instrumental. I think this is a bad combination of buckets, and when I say theory here, I do not exclude theoretical instrumental rationality.

Comment by scott-garrabrant on Mapping the Archipelago · 2018-02-26T19:46:18.986Z · score: 53 (11 votes) · LW · GW

To me, the best content in the old Less Wrong does not fit on any of you islands. It was developing theoretical rationality.

It would probably go on on near the AI safety island, but I feel like that is not fairly representing its generality.

Comment by scott-garrabrant on Arguments about fast takeoff · 2018-02-26T19:34:59.071Z · score: 23 (6 votes) · LW · GW

I first interpreted your operationalization of slow take off to mean something that is true by definition (assuming the economy is strictly increasing).

I assume how you wanted me to interpret it is that the first 4 year doubling interval is disjoint from the first 1 year doubling interval. (the 4 year one ends before the 1 year one starts.)

Robustness to Scale

2018-02-21T22:55:19.155Z · score: 159 (44 votes)

Don't Condition on no Catastrophes

2018-02-21T21:50:31.077Z · score: 82 (26 votes)

A Proper Scoring Rule for Confidence Intervals

2018-02-13T01:45:06.341Z · score: 97 (29 votes)

Knowledge is Freedom

2018-02-09T05:24:54.932Z · score: 62 (15 votes)

Sources of intuitions and data on AGI

2018-01-31T23:30:17.176Z · score: 152 (47 votes)

Goodhart Taxonomy

2017-12-30T17:19:47.000Z · score: 0 (0 votes)

The Three Levels of Goodhart's Curse

2017-12-30T16:41:25.000Z · score: 3 (3 votes)

Goodhart Taxonomy

2017-12-30T16:38:39.661Z · score: 163 (57 votes)

Logical Updatelessness as a Robust Delegation Problem

2017-11-30T04:23:24.000Z · score: 0 (0 votes)

Logical Updatelessness as a Robust Delegation Problem

2017-10-27T21:16:18.076Z · score: 48 (14 votes)

Conditioning on Conditionals

2017-08-17T01:15:08.000Z · score: 7 (1 votes)

Cooperative Oracles: Nonexploited Bargaining

2017-06-03T00:39:55.000Z · score: 4 (4 votes)

Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima

2017-06-03T00:38:46.000Z · score: 3 (3 votes)

Cooperative Oracles: Introduction

2017-06-03T00:36:17.000Z · score: 3 (3 votes)

Entangled Equilibria and the Twin Prisoners' Dilemma

2017-06-02T22:09:00.000Z · score: 2 (2 votes)

Two Major Obstacles for Logical Inductor Decision Theory

2017-04-17T21:10:55.000Z · score: 8 (7 votes)

Prediction Based Robust Cooperation

2017-02-22T01:52:51.000Z · score: 1 (1 votes)

postCDT: Decision Theory using post-selected Bayes nets

2016-11-06T22:22:32.000Z · score: 3 (3 votes)

Updatelessness and Son of X

2016-11-04T22:58:23.000Z · score: 3 (3 votes)

A failed attempt at Updatelessness using Universal Inductors

2016-11-03T20:25:09.000Z · score: 2 (2 votes)

Transitive negotiations with counterfactual agents

2016-10-20T23:27:04.000Z · score: 3 (3 votes)

The set of Logical Inductors is not Convex

2016-09-27T09:05:00.000Z · score: 3 (3 votes)

Logical Inductors contain Logical Inductors over other complexity classes

2016-09-26T22:17:44.000Z · score: 3 (3 votes)

Logical Inductors that trust their limits

2016-09-20T23:17:55.000Z · score: 3 (3 votes)

Universal Inductors

2016-09-14T00:09:42.000Z · score: 6 (6 votes)

The many counterfactuals of counterfactual mugging

2016-04-12T20:04:38.000Z · score: 2 (2 votes)

Another Concise Open Problem

2016-01-28T23:04:06.000Z · score: 2 (2 votes)

Second Failure of Inductive Learning with a Delay in Feedback

2016-01-28T21:32:09.000Z · score: 2 (2 votes)