**so8res**on On motivations for MIRI's highly reliable agent design research · 2017-01-29T16:01:12.000Z · score: 0 (0 votes) · LW · GW

The second statement seems pretty plausible (when we consider human-accessible AGI designs, at least), but I'm not super confident of it, and I'm not resting my argument on it.

The weaker statement you provide doesn't seem like it's addressing my concern. I expect there *are* ways to get highly capable reasoning (sufficient for, e.g., gaining decisive strategic advantage) without understanding low-K "good reasoning"; the concern is that said systems are much more difficult to align.

**so8res**on On motivations for MIRI's highly reliable agent design research · 2017-01-29T01:53:34.000Z · score: 3 (3 votes) · LW · GW

As I noted when we chatted about this in person, my intuition is less "there is some small core of good consequentialist reasoning (it has “low Kolmogorov complexity” in some sense), and this small core will be quite important for AI capabilities" and more "good consequentialist reasoning is low-K and those who understand it will be better equipped to design AGI systems where the relevant consequentialist reasoning happens in transparent boxes rather than black boxes."

Indeed, if I thought one *had* to understand good consequentialist reasoning in order to design a highly capable AI system, I'd be less worried by a decent margin.

**so8res**on My current take on the Paul-MIRI disagreement on alignability of messy AI · 2017-01-13T02:11:45.000Z · score: 0 (0 votes) · LW · GW

Weighing in late here, I'll briefly note that my current stance on the difficulty of philosophical issues is (in colloquial terms) "for the love of all that is good, *please don't attempt to implement CEV with your first transhuman intelligence*". My strategy at this point is very much "build the minimum AI system that is capable of stabilizing the overall strategic situation, and then buy a whole lot of time, and then use that time to figure out what to do with the future." I might be more optimistic than you about how easy it will turn out to be to find a reasonable method for extrapolating human volition, but I suspect that that's a moot point either way, because regardless, thou shalt not attempt to implement CEV with humanity's very first transhuman intelligence.

Also, +1 to the overall point of "also pursue other approaches".

**so8res**on MIRI's 2016 Fundraiser · 2016-11-03T22:24:06.990Z · score: 0 (0 votes) · LW · GW

Thanks :-)

**so8res**on MIRI's 2016 Fundraiser · 2016-11-01T23:30:26.238Z · score: 0 (0 votes) · LW · GW

Thanks!

**so8res**on MIRI's 2016 Fundraiser · 2016-10-19T23:21:01.143Z · score: 1 (1 votes) · LW · GW

Fixed, thanks.

**so8res**on MIRI's 2016 Fundraiser · 2016-10-04T20:41:49.220Z · score: 2 (2 votes) · LW · GW

Huh, thanks for the heads up. If you use an ad-blocker, try pausing that and refreshing. Meanwhile, I'll have someone look into it.

**so8res**on MIRI's 2016 Fundraiser · 2016-09-26T18:39:53.142Z · score: 3 (3 votes) · LW · GW

Thanks!

## MIRI's 2016 Fundraiser

2016-09-25T16:55:43.899Z · score: 20 (21 votes)**so8res**on Double Corrigibility: better Corrigibility · 2016-05-02T19:25:55.266Z · score: 4 (4 votes) · LW · GW

FYI, this is not what the word "corrigibility" means in this context. (Or, at least, it's not how we at MIRI have been using it, and it's not how Stuart Russell has been using it, and it's not a usage that I, as one of the people who originally brought that word into the AI alignment space, endorse.) We use the phrase "utility indifference" to refer to what you're calling "corrigibility", and we use the word "corrigibility" for the broad vague problem that "utility indifference" was but one attempt to solve.

By analogy, imagine people groping around in the dark attempting to develop probability theory. They might call the whole topic the topic of "managing uncertainty," and they might call specific attempts things like "fuzzy logic" or "multi-valued logic" before eventually settling on something that seems to work pretty well (which happened to be an attempt called "probability theory.") We're currently reserving the "corrigibilty" word for the analog of "managing uncertainty"; that is, we use the "corrigibility" label to refer to the *highly general* problem of developing AI algorithms that cause a system to (in an intuitive sense) reason without incentives to deceive/manipulate, and to reason (vaguely) as if it's still under construction and potentially dangerous :-)

**so8res**on Safety engineering, target selection, and alignment theory · 2015-12-31T16:28:33.075Z · score: 5 (7 votes) · LW · GW

By your analogy, one of the main criticism of doing MIRI-style AGI safety research now is that it's like 10th century Chinese philosophers doing Saturn V safety research based on what they knew about fire arrows.

This is a fairly common criticism, yeah. The point of the post is that MIRI-style AI alignment research is less like this and more like Chinese mathematicians researching calculus and gravity, which is still difficult, but much easier than attempting to do safety engineering on the Saturn V far in advance :-)

## Safety engineering, target selection, and alignment theory

2015-12-31T15:43:18.481Z · score: 17 (18 votes)**so8res**on Why CFAR? The view from 2015 · 2015-12-18T17:37:15.501Z · score: 13 (13 votes) · LW · GW

Yes, precisely. (Transparency illusion strikes again! I had considered it obvious that the default outcome was "a few people are nudged slightly more towards becoming AI alignment researchers someday", and that the outcome of "actually cause at least one very talented person to become AI alignment researcher who otherwise would not have, over the course of three weeks" was clearly in "resounding success" territory, whereas "turn half the attendees into AI alignment researchers" is in I'll-eat-my-hat territory.)

**so8res**on Why CFAR? The view from 2015 · 2015-12-18T02:44:40.765Z · score: 11 (13 votes) · LW · GW

I don't claim that it developed skill and talent in all participants, nor even in the median participant. I do stand by my claim that it appears to have had drastic good effects on a few people though, and that it led directly to MIRI hires, at least one of which would not have happened otherwise :-)

**so8res**on MIRI's 2015 Winter Fundraiser! · 2015-12-15T16:26:08.074Z · score: 2 (2 votes) · LW · GW

Thanks! :-p It's convenient to have the 2015 fundraisers end before 2015 ends, but we may well change the way fundraisers work next year.

**so8res**on MIRI's 2015 Winter Fundraiser! · 2015-12-15T04:45:26.581Z · score: 2 (2 votes) · LW · GW

Thanks! Our languages and frameworks definitely have been improving greatly over the last year or so, and I'm excited to see where we go now that we've pulled a sizable team together.

**so8res**on LessWrong 2.0 · 2015-12-13T23:55:29.262Z · score: 12 (12 votes) · LW · GW

I have the requisite decision-making power. I hereby delegate Vaniver to come up with a plan of action, and will use what power I have to see that that plan gets executed, so long as the plan seems unlikely to do more harm than good (but regardless of whether I think it will work). Vaniver and the community will need to provide the personpower and the funding, of course.

**so8res**on MIRI's 2015 Winter Fundraiser! · 2015-12-11T23:08:03.037Z · score: 5 (5 votes) · LW · GW

Thanks! And thanks again for your huge donation in the summer; I was not expecting more.

**so8res**on MIRI's 2015 Winter Fundraiser! · 2015-12-10T19:34:16.359Z · score: 2 (2 votes) · LW · GW

Thanks!

**so8res**on MIRI's 2015 Winter Fundraiser! · 2015-12-10T19:34:09.903Z · score: 3 (3 votes) · LW · GW

Thanks!

## MIRI's 2015 Winter Fundraiser!

2015-12-09T19:00:32.567Z · score: 28 (29 votes)**so8res**on MIRI's 2015 Winter Fundraiser! · 2015-12-08T15:28:56.733Z · score: 3 (3 votes) · LW · GW

Thanks!

**so8res**on MIRI's 2015 Winter Fundraiser! · 2015-12-08T04:05:19.330Z · score: 3 (3 votes) · LW · GW

Thanks!

**so8res**on MIRI's 2015 Winter Fundraiser! · 2015-12-08T00:23:55.128Z · score: 5 (5 votes) · LW · GW

Thanks!

**so8res**on The trouble with Bayes (draft) · 2015-10-26T19:23:49.907Z · score: 1 (1 votes) · LW · GW

I mostly agree here, though I'm probably less perturbed by the six year time gap. It seems to me like most of the effort in this space has been going towards figuring out how to handle logical uncertainty and logical counterfactuals (with some reason to believe that answers will bear on the question of how to generate priors), with comparatively little work going into things like naturalized induction that attack the problem of priors more directly.

Can you say any more about alternatives you've been considering? I can easily imagine a case where we look back and say "actually the entire problem was about generating a prior-like-thingy" but I have a harder time visualizing different tacts altogether (that don't eventually have some step that reads "then treat observations like Bayesian evidence").

**so8res**on The trouble with Bayes (draft) · 2015-10-25T23:25:40.495Z · score: 1 (1 votes) · LW · GW

Yeah, I also have nontrivial odds on "something UDTish is more fundamental than Bayesian inference" / "there are no probabilities only values" these days :-)

**so8res**on The trouble with Bayes (draft) · 2015-10-24T18:19:32.502Z · score: 5 (5 votes) · LW · GW

If the Bayesian's ignoring information, then you gave them the wrong prior. As far as I can tell, the objection is that the prior over theta which doesn't ignore the information depends on pi, and intuitions say that Bayesians should think that pi should be independent from theta. But if theta can be chosen in response to pi, then the Bayesian prior over theta had better depend on pi.

I wasn't saying that this problem is "adversarial" in the "you're punishing Bayesians therefore I don't have to win" way; I agree that that would be a completely invalid argument. I was saying "if you want me to succeed even when theta is chosen by someone who doesn't like me after pi is chosen, I need a prior over theta which depends on pi." Then everything works out, except that Robins and Wasserman complain that this is torturing Bayesiansim to give a frequentist answer. To that, I shrug. You want me to get the frequentist result ("no matter which theta you pick I converge") then the result will look frequentist. Not much surprise there.

This is a very natural problem that comes up constantly.

You realize that the Bayesian gets the right answer way faster than the frequentist in situations where theta is discrete, or sufficiently smooth, or parametric, right? I doubt you find problems like this where theta is non-parametric and utterly discontinuous "naturally" or "constantly". But *even if you do*, the Bayesian will still succeed with a prior over theta that is independent of pi, *except* when the pi is so complicated and theta that is so discontinuous and so precisely tailored to hiding information in places that pi makes it very very difficult to observe that the only way you can learn theta is by knowing that it's been tailored to that particular pi. (The frequentist is essentially always assuming that theta is tailored to pi in this way, because they're essentially acting like theta might have been selected by an adversary, because that's what you do if you want to converge in all cases.) And *even in that case* the Bayesian can succeed by putting a prior on theta that depends on pi. What's the problem?

Imagine there's a game where the two of us will both toss an infinite number of uncorrelated fair coins, and then check which real numbers are encoded by these infinite bit sequences. Using any sane prior, I'll assign measure zero to the event "we got the same real number." If you're then like "Aha! But what if my coin actually always returns the same result as yours?" then I'm going to shrug and use a prior which assigns some non-zero probability to a correlation between our coins.

Robins and Wasserman's game is similar. We're imagining a non-parametric theta that's very difficult to learn about, which is like the first infinite coin sequence (and their example does require that it encode infinite information). Then we also imagine that there's some function pi which makes certain places easier or harder to learn about, which is like the second coin sequence. Robins and Wasserman claim, roughly, that for some finite set of observations and sufficiently complicated pi, a reasonable Bayesian will place ~zero probability on theta just happening to hide all its terrible discontinuities in that pi in just such a way that the only way you can learn theta is by knowing that it is one of the thetas that hides its information in that particular pi; this would be like the coin sequences coinciding. Fine, I agree that under sane priors and for sufficiently complex functions pi, that event has measure zero -- if theta is as unstructured as you say, it would take an infinite confluence of coincident events to make it one of the thetas that happens to hide all its important information precisely such that this particular pi makes it impossible to learn.

If you then say "Aha! Now I'm going to score you by your performance against precisely those thetas that hide in that pi!" then I'm going to shrug and require a prior which assigns some non-zero probability to theta being one of the thetas that hides its info in pi.

That normally wouldn't require any surgery to the intuitive prior (I place positive but small probability on any finite pair of sequences of coin tosses being identical), but if we're assuming that it actually takes an infinite confluence of coincident events for theta to hide its info in pi and you still want to measure me against thetas that do this, then yeah, I'm going to need a prior over theta that depends on pi. You can cry "that's violating the spirit of Bayes" all you want, but it still works.

And in the real world, I *do* want a prior which can eventually say "huh, our supposedly independent coins have come up the same way 2^trillion times, I wonder if they're actually correlated?" or which can eventually say "huh, this theta sure seems to be hiding lots of very important information in the places that pi makes it super hard to observe, I wonder if they're actually correlated?" so I'm quite happy to assign some (possibly very tiny) non-zero prior probability on a correlation between the two of them. Overall, I don't find this problem perturbing.

You can't really say "oh I believe in the likelihood principle," and then rule out examples where the principle fails as unnatural or adversarial.

I agree completely!

**so8res**on The trouble with Bayes (draft) · 2015-10-24T16:37:04.193Z · score: 4 (4 votes) · LW · GW

I understand the "no methods only justifications" view, but it's much less comforting when you need to ultimately build a reliable reasoning system :-)

I remain mostly unperturbed by this game. You made a very frequentist demand. From a Bayesian perspective, your demand is quite a strange one. If you force me to achieve it, then yeah, I may end up doing frequentist-looking things.

In attempts to steel-man the Robins/Wasserman position, it seems the place I'm supposed to be perturbed is that I can't even achieve the frequentist result unless I'm willing to make my prior for theta depend on pi, which seems to violate the spirit of Bayesian inference?

Ah, and now I think I see what's going on! The game that corresponds to a Bayesian desire for this frequentist property is not the game listed; it's the variant where theta is chosen *adversarially* by someone who doesn't want you to end up with a good estimate for psi. (Then the Bayesian wants a guarantee that they'll converge for every theta.) But those are precisely the situations where the Bayesian *shouldn't* be ignoring pi; the adversary will hide as much contrary data as they can in places that are super-difficult for the spies to observe.

Robins and Wasserman say "once a subjective Bayesian queries the randomizer (who selected pi) about the randomizer’s reasoned opinions concerning theta (but not pi) the Bayesian will have independent priors." They didn't show their math on this, but I doubt this point carries their objection. If I ask the person who selected pi how theta was selected, and they say "oh, it was selected in response to pi to cram as much important data as possible into places that are extraordinarily difficult for spies to enter," then I'm willing to buy that *after updating* (which I will do) I now have a distribution over theta that's independent of pi. But this new distribution will be one where I'll eventually converge to the right answer on this particular pi!

So yeah, if I'm about to start playing the treasure hunting game, and then somebody informs me that theta was actually chosen adversarially after pi was chosen, I'm definitely going to need to update on pi. Which means that if we add an adversary to the game, my prior must depend on pi. Call it forced if you will; but it seems correct to me that if you tell me the game might be adversarial (thus justifying your frequentist demand) then I will expect theta to sometimes be dependent on pi (in the most inconvenient possible way).

**so8res**on The trouble with Bayes (draft) · 2015-10-24T15:48:49.833Z · score: 4 (4 votes) · LW · GW

Sure! I would like to clarify, though, that by "logically omniscient" I also meant "while being way larger than everything else in the universe." I'm also readily willing to admit that Bayesian probability theory doesn't get anywhere near solving *decision* theory, that's an entirely different can of worms where there's still lots of work to be done. (Bayesian probability theory alone does not prescribe two-boxing, in fact; that requires the addition of some decision theory which tells you how to compute the consequences of actions given a probability distribution, which is way outside the domain of Bayesian inference.)

Bayesian reasoning is an idealized method for building accurate world-models when you're the biggest thing in the room; two large open problems are (a) modeling the world when you're *smaller* than the universe and (b) computing the counterfactual consequences of actions from your world model. Bayesian probability theory sheds little light on either; nor is it intended to.

I personally don't think it's that useful to consider cases like "but what if there's two logically omniscient reasoners in the same room?" and then demand a coherent probability distribution. Nevertheless, you can do that, and in fact, we've recently solved that problem (Benya and Jessica Taylor will be presenting it at LORI V next week, in fact); the answer, assuming the usual decision-theoretic assumptions, is "they play Nash equilibria", as you'd expect :-)

**so8res**on The trouble with Bayes (draft) · 2015-10-24T02:42:36.474Z · score: 7 (7 votes) · LW · GW

As for the Robins / Wasserman example, here's my initial thoughts. I'm not entirely sure I'm understanding their objection correctly, but at a first pass, nothing seems amiss. I'll start by gameifying their situation, which helps me understand it better. Their situation seems to work as follows: Imagine an island with a d-dimensional surface (set d=2 for easy visualization). Anywhere along the island, we can dig for treasure, but only if that point on the island is unoccupied. At the beginning of the game, all points on the island are occupied. But people sometimes leave the points with uniform probability, in which case the point can be acquired and whoever acquires it can dig for treasure at that point. (The Xi variables on the blog are points on the island that become unoccupied during the game; we assume this is a uniformly random process.)

We're considering investing in a given treasure-digging company that's going to acquire land and dig on this island. At each point on the island, there is some probability of it having treasure. What we want to know, so that we can decide whether to invest, is how much treasure is on the island. We will first observe the treasure company acquire n points of land and dig there, and then we will decide whether to invest. (The Yi variables are the probability of treasure at the corresponding Xi. There is some function theta(x) which determines the probability of treasure at x. We want to estimate the unconditional probability that there is treasure anywhere on the island, this is psi, which is the integral of theta(x) dx.)

However, the company tries to hide facts about whether or not they actually struck treasure. What we do is, we hire a spy firm. Spies aren't perfect, though, and some points are harder to spy on than others (if they're out in the open, or have little cover, etc.) For each point on the island, there is some probability of the spies succeeding at observing the treasure diggers. We, fortunately, know exactly how likely the spies are to succeed at any given point. If the spies succeed in their observation, they tell us for sure whether the diggers found treasure. (The successes of the spies are the Ri variables. pi(x) is the probability of successfully spying at point x.)

To summarize, we have three series of variables Xi, Yi, and Ri. All are i.i.d. Yi and Ri are conditionally independent given Xi. The Xi are uniformly distributed. There is some function theta(x) which tells us how likely the there is to be treasure at any given point, and there's some other function pi(x) which tells us how likely the spies are to successfully observe x. Our task is to estimate psi, the probability of treasure at any random point on the island, which is the integral of theta(x) dx.

The game works as follows: n points x1..xn open on the island, and we observe that those points were acquired by the treasure diggers, and for some of them we send out our spy agency to maybe learn theta(xi). Robins and Wasserman argue something like the following (afaict):

"You observe finitely many instances of theta(x). But the surface of the island is continuous and huge! You've observed a teeny tiny fraction of Y-probabilities at certain points, and you have no idea how theta varies across the space, so you've basically gained zero information about theta and therefore psi."

To which I say: Depends on your prior over theta. If you assume that theta can vary wildly across the space, then observing only finitely many theta(xi) tells you almost nothing about theta in general, to be sure. In that case, you learn almost nothing by observing finitely many points -- nor should you! If instead you assume that the theta(xi) do give you lots of evidence about theta in general, then you'll end up with quite a good estimate of psi. If your prior has you somewhere in between, then you'll end up with an estimate of psi that's somewhere in between, as you should. The function pi doesn't factor in at all unless you have reason to believe that pi and theta are correlated (e.g. it's easier to spy on points that don't have treasure, or something), but Robins and Wasserman state explicitly that they don't want to consider those scenarios. (And I'm fine with assuming that pi and theta are uncorrelated.)

(The frequentist approach takes pi into account anyway and ends up eventually concentrating its probability mass mostly around one point psi in the space of possible psi values, causing me to frown very suspiciously, because we were assuming that pi doesn't tell us anything about psi.)

Robins and Wasserman then argue that the frequentist approach gives the following guarantee: No matter what function theta(x) determines the probability of treasure at x, they only need to observe finitely many points before their estimate for psi is "close" to the true psi (which they define formally). They argue that Bayesians have a very hard time generating a prior that has this property. (They note that it is possible to construct a prior that yields an estimate similar to the frequentist estimate, but that this requires torturing the prior until it gives a frequentist answer, at which point, why not just become a frequentist?)

I say, sure, it's hard (though not impossible) for a Bayesian to get that sort of guarantee. But nothing is amiss here! Two points:

(a) They claim that it's disconcerting that the theta(xi) don't give a Bayesian much information about theta. They admit that there are priors on theta that allow you to get information about theta from finitely many theta(xi), but protest that these theta are pretty weird ("very very very smooth") if the dimensionality d of the island is very high. In which case I say, if you think that the theta(xi) can't tell you much about theta, then you *shouldn't* be learning about theta when you learn about the various theta(xi)! In fact, I'm suspicious of anyone who says they can, under these assumptions.

Also, I'm not completely convinced that "the observations are uninformative about theta" implies "the observations are uninformative about psi" -- I acknowledge that from theta you can compute psi, and thus in some sense theta is the "only unknown," but I think you might be able to construct a prior where you learn little about theta but lots about psi. (Maybe the i.i.d. assumption rules this possibility out? I'm not sure yet, I haven't done the math.) But assume we either don't have any way of getting information about psi except by integrating theta, or that we don't have a way of doing it except one that looks "tortured" (because otherwise their argument falls through anyway). That brings us to my second point:

(b) They ask for the property that, no matter what theta is the true theta, you, after only finitely many trials, assign very high probability to the true value of psi. That's a crazy demand! What if the true theta is one where learning finitely many theta(xi) *doesn't* give you any information about theta? If we have a theta such that my observations are telling me nothing about it, then I don't *want* to be slowly concentrating all my probability mass on one particular value of psi; that would be mad. (Unless the observations are giving me information about psi via some mechanism other than information about theta, which we're assuming is not the case.)

If the game is *really* working like they say it is, then the frequentist is often concentrating probability around some random psi for no good reason, and when we actually draw random thetas and check who predicted better, we'll see that they actually converged around completely the wrong values. Thus, I doubt the claim that, setting up the game exactly as given, the frequentist converges on the "true" value of psi. If we assume the frequentist does converge on the right answer, then I strongly suspect either (1) we should be using a prior where the observations are informative about psi even if they aren't informative about theta or (2) they're making an assumption that amounts to forcing us to use the "tortured" prior. I wouldn't be too surprised by (2), given that their demand on the posterior is a very frequentist demand, and so asserting that it's *possible* to zero in on the true psi using this data in finitely many steps for any theta may very well amount to asserting that the prior is the tortured one that forces a frequentist-looking calculation. They don't describe the "tortured prior" in the blog post, so I'm not sure what else to say here ¯\_(ツ)_/¯

There are definitely some parts of the argument I'm not following. For example, they claim that for simple functions pi, the Bayesian solution obviously works, but there's no single prior on theta which works for any pi no matter how complex. I'm very suspicious about this, and I wonder whether they mean is there's no *sane* prior which works for any pi, and that that's the place they're slipping the "but you can't be logically omniscient!" objection in, at which point yes, Bayesian reasoning is not the right tool. Unfortunately, I don't have any more time to spend digging at this problem. By and large, though, my conclusion is this:

If you set the game up as stated, and the observations are actually giving literally zero data about psi, then I will be sticking to my prior on psi, thankyouverymuch. If a frequentist assumes they can use pi to update and zooms off in one direction or another, then they will be wrong most of the time. If you also say the frequentist is performing well then I deny that the observations were giving no info. (By the time they've converged, the Bayesian must also have data on theta, or at least psi.) If it's possible to zero in on the true value of psi after finitely many observations, then I'm going to have to use a prior that allows me to do so, regardless of whether or not it appears tortured to you :-)

(Thanks to Benya for helping me figure out what the heck was going on here.)

**so8res**on The trouble with Bayes (draft) · 2015-10-23T23:44:22.159Z · score: 9 (9 votes) · LW · GW

Thanks for writing this post! I think it contains a number of insightful points.

You seem to be operating under the impression that subjective Bayesians think you Bayesian statistical tools are always the best tools to use in different practical situations? That's likely true of many subjective Bayesians, but I don't think it's true of most "Less Wrong Bayesians." As far as I'm concerned, Bayesian statistics is not intended to handle logical uncertainty or reasoning under deductive limitation. It's an answer to the question "if you were logically omniscient, how should you reason?"

You provide examples where a deductively limited reasoner can't use Bayesian probability theory to get to the right answer, and where designing a prior that handles real-world data in a reasonable way is wildly intractable. Neat! I readily concede that deductively limited reasoners need to make use of a grab-bag of tools and heuristics depending on the situation. When a frequentist tool gets the job done fastest, I'll be first in line to use the frequentist tool. But none of this seems to bear on the philosophical question to which Bayesian probability is intended as an answer.

If someone does not yet have an understanding of thermodynamics and is still working hard to build a perpetual motion machine, then it may be quite helpful to teach them about the Carnot heat engine, as the theoretical ideal. Once it comes time for them to actually build an engine in the real world, they're going to have to resort to all sorts of hacks, heuristics, and tricks in order to build something that works at all. Then, if they come to me and say "I have lost faith in the Carnot heat engine," I'll find myself wondering what they thought the engine was for.

The situation is similar with Bayesian reasoning. For the masses who still say "you're entitled to your own opinion" or who use one argument against an army, it is quite helpful to tell them: Actually, the laws of reasoning are known. This is something humanity has uncovered. Given what you knew and what you saw, there is only one consistent assignment of probabilities to propositions. We know the most accurate way for a logically omniscient reasoner to reason. If they then go and try to do accurate reasoning, while under strong deductive limitations, they will of course find that they need to resort to all sorts of hacks, heuristics, and tricks, to reason in a way that even works at all. But if seeing this, they say "I have lost faith in Bayesian probability theory," then I'll find myself wondering about what they thought the framework was for.

From your article, I'm pretty sure you understand all this, in which case I would suggest that if you do post something like this to main, you consider a reframing. The Bayesians around these parts will very likely agree that (a) constructing a Bayesian prior that handles the real world is nigh impossible; (b) tools labeled "Bayesian" have no particular superpowers; and (c) when it comes time to solving practical real-world problems under deductive limitations, do whatever works, even if that's "frequentist".

Indeed, the Less Wrong crowd is likely going to be first in line to admit that constructing things-kinda-like-priors that can handle induction in the real world (sufficient for use in an AI system) is a massive open problem which the Bayesian framework sheds little light on. They're also likely to be quick to admit that Bayesian mechanics fails to provide an account of how deductively limited reasoners should reason, which is another gaping hole in our current understanding of 'good reasoning.'

I agree with you that deductively limited reasoners shouldn't pretend they're Bayesians. That's not what the theory is there for. It's there as a model of how logically omniscient reasoners could reason accurately, which was big news, given how very long it took humanity to think of themselves as anything like a reasoning engine designed to acquire bits of mutual information with the environment one way or another. Bayesianism is certainly not a panacea, though, and I don't think you need to convince too many people here that it has practical limitations.

That said, if you have example problems where a *logically omniscient* Bayesian reasoner who incorporates all your implicit knowledge into their prior would get the wrong answers, *those* I want to see, because those do bear on the philosophical question that I currently see Bayesian probability theory as providing an answer to--and if there's a chink in *that* armor, then I want to know :-)

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-08-29T15:38:04.559Z · score: 2 (2 votes) · LW · GW

Thanks!

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-08-27T21:47:12.259Z · score: 1 (1 votes) · LW · GW

Thanks!

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-08-27T14:48:07.212Z · score: 2 (2 votes) · LW · GW

Sweet! Thanks!

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-08-20T17:45:26.797Z · score: 2 (2 votes) · LW · GW

Nice. Thanks!

## MIRI's 2015 Summer Fundraiser!

2015-08-19T00:27:44.535Z · score: 42 (47 votes)**so8res**on MIRI's Approach · 2015-07-31T02:45:31.840Z · score: 6 (6 votes) · LW · GW

Thanks again, Jacob. I don't have time to reply to all of this, but let me reply to one part:

Once one acknowledges that the bit exact 'best' solution either does not exist or cannot be found, then there is an enormous (infinite really) space of potential solutions which have different tradeoffs in their expected utillity in different scenarios/environments along with different cost structures. The most interesting solutions often are so complex than they are too difficult to analyze formally.

I don't buy this. Consider the "expert systems" of the seventies, which used curated databases of logical sentences and reasoned from those using a whole lot of ad-hoc rules. They could just as easily have said "Well we need to build systems that deal with lots of special cases, and you can never be certain about the world. We cannot get exact solutions, and so we are doomed to the zone of heuristics and tradeoffs where the only interesting solutions are too complex to analyze formally." But they would have been wrong. There *were* tools and concepts and data structures that they were missing. Judea Pearl (and a whole host of others) showed up, formalized probabilistic graphical models, related them to Bayesian inference, and suddenly a whole class of ad-hoc solutions were superseded.

So I don't buy that "we can't get exact solutions" implies "we're consigned to complex heuristics." People were using complicated ad-hoc rules to approximate logic, and then later they were using complex heuristics to approximate Bayesian inference, and this was *progress.*

My claim is that there are other steps such as those that haven't been made yet, that there are tools on the order of "causal graphical models" that we are missing.

Imagine encountering a programmer from the future who knows how to program an AGI and asking them "How do you do that whole multi-level world-modeling thing? Can you show me the algorithm?" I strongly expect that they'd say something along the lines of "oh, well, you set up a system like this and then have it take percepts like that, and then you can see how if we run this for a while on lots of data it starts building multi-level descriptions of the universe. Here, let me walk you through what it looks like for the system to discover general relativity."

Since I don't know of a way to set up a system such that it would knowably and reliably start modeling the universe in this sense, I suspect that we're missing some tools.

I'm not sure whether your view is of the form "actually the programmer of the future would say "I don't know how it's building a model of the world either, it's just a big neural net that I trained for a long time"" or whether it's of the form "actually we *do* know how to set up that system already", or whether it's something else entirely. But if it's the second one, then please tell! :-)

**so8res**on MIRI's Approach · 2015-07-30T21:55:29.732Z · score: 9 (9 votes) · LW · GW

It's cited a lot in MIRI's writing because it's the first example that pops to my mind, and I'm the one who wrote all the writings where it appears :-p

For other examples, see maybe "Artificial Evolution in the Physical World" (Thompson, 1997) or "Computational Genetics, Physiology, Metabolism, Neural Systems, Learning, Vision, and Behavior or PolyWorld: Life in a New Context." (Yaeger, 1994). IIRC.

## MIRI's Approach

2015-07-30T20:03:51.054Z · score: 34 (35 votes)**so8res**on MIRI's Approach · 2015-07-30T17:08:23.045Z · score: 10 (10 votes) · LW · GW

Thanks for the reply, Jacob! You make some good points.

Why not test safety long before the system is superintelligent? - say when it is a population of 100 child like AGIs. As the population grows larger and more intelligent, the safest designs are propagated and made safer.

I endorse eli_sennesh's response to this part :-)

This again reflects the old 'hard' computer science worldview, and obsession with exact solutions.

I am not under the impression that there are "exact solutions" available, here. For example, in the case of "building world-models," you can't even get "exact" solutions using AIXI (which does Bayesian inference using a simplicity prior in order to guess what the environment looks like; and can never figure it out exactly). And this is in the simplified setting where AIXI is large enough to contain *all possible* environments! We, by contrast, need to understand algorithms which allow you to build a world model of the world that you're *inside of*; exact solutions are clearly off the table (and, as eli_sennesh notes, huge amounts of statistical modeling are on it instead).

I would readily accept a statistical-modeling-heavy answer to the question of "but how do you build multi-level world-models from percepts, in principle?"; and indeed, I'd be astonished if you avoided it.

Perhaps you read "we need to know how to do X in principle before we do it in practice" as "we need a perfect algorithm that gives you bit-exact solutions to X"? That's an understandable reading; my apologies. Let me assure you again that we're not under the illusion you can get bit-exact solutions to most of the problems we're working on.

For example - perhaps using lots and lots of computing power makes the problem harder instead of easier. How could that be? Because with lots and lots of compute power, you are naturally trying to extrapolate the world model far far into the future, where it branches enormously [...]

Hmm. If you have lots and lots of computing power, you can always just... not use it. It's not clear to me how additional computing power can make the problem *harder* -- at worst, it can make the problem *no easier*. I agree, though, that algorithms for modeling the world *from the inside* can't just extrapolate arbitrarily, on pain of exponential complexity; so whatever it takes to build and use multi-level world-models, it can't be *that.*

Perhaps the point where we disagree is that you think these hurdles suggest that figuring out how to do things we can't yet do in principle is hopeless, whereas I'm under the impression that these shortcomings highlight places where we're still confused?

## MIRI Fundraiser: Why now matters

2015-07-24T22:38:06.131Z · score: 28 (29 votes)**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-07-22T03:21:03.906Z · score: 6 (6 votes) · LW · GW

Nice! Thanks -- that will be a fun goal to hit.

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-07-22T03:19:29.904Z · score: 18 (18 votes) · LW · GW

Hi Nate, can you briefly describe this second approach?

Yep! This is a question we've gotten a few times already, and the answer will likely appear in a blog post later in the fundraiser. In the interim, the short version is that there are a few different promising candidates for a second approach, and we haven't settled yet on exactly which would be next in line. (This is one of the reasons why our plans extend beyond $6M.) I can say that the new candidates would still be aimed towards ensuring that the creation of human-programmed AGI goes well -- the other pathways (whole-brain emulation, etc.) are very important, but they aren't within our purview. It's not clear yet whether we'd focus on new direct approaches to the technical problems (such as, e.g., Paul Christiano's "can we reduce this problem to reliable predictions about human behavior" approach) or whether we'd focus on projects that would be particularly exciting to modern AI professionals or modern security professionals, in attempts to build stronger bridges to academia.

In fact, I'd actually be quite curious about which approaches you think are the most promising before deciding.

On another note, do you know anything about Elon Musk possibly having changed his mind about the threat of AI and how that might affect future funding of work in this area?

I wasn't at the ICML workshop, so I can't say much about how that summary was meant to be interpreted. That said, I wouldn't read too much into it: "Hassabis has convinced Musk" doesn't tell us much about what Demis claimed. Best I can guess from the context is that he said he convinced Elon that overhyping concern about AI could be harmful, but it's hard to be sure.

I can say, however, that I'm in contact with both Elon and Demis, and that I'm not currently worried about Elon disappearing into the mist :-)

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-07-21T21:23:50.540Z · score: 7 (7 votes) · LW · GW

Thanks Davis! We'll try to keep up the pace.

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-07-21T19:46:16.792Z · score: 8 (8 votes) · LW · GW

Amazing. Thanks!

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-07-21T15:39:08.757Z · score: 6 (6 votes) · LW · GW

Thanks :-)

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-07-21T15:38:39.755Z · score: 7 (7 votes) · LW · GW

Thanks!

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-07-21T02:13:24.999Z · score: 9 (9 votes) · LW · GW

Thanks Luke! :-D

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-07-21T01:29:28.077Z · score: 9 (9 votes) · LW · GW

Woah. Thanks!

**so8res**on MIRI's 2015 Summer Fundraiser! · 2015-07-21T01:09:54.014Z · score: 7 (7 votes) · LW · GW

Thanks!

## Taking the reins at MIRI

2015-06-03T23:52:28.074Z · score: 62 (63 votes)## The Stamp Collector

2015-05-01T23:11:22.661Z · score: 25 (27 votes)**so8res**on Why isn't the following decision theory optimal? · 2015-04-18T21:23:36.463Z · score: 1 (1 votes) · LW · GW

the 'updateless' part does seem very similar to the "act as if you had precommitted to any action that you'd have wanted to precommit to" core idea of NDT

Yep, that's a common intuition pump people use in order to understand the "updateless" part of UDT.

It's not clear to me that the super powerful UDT would make the wrong decision in the game where two players pick numbers between 0-10

A proof-based UDT agent would -- this follows from the definition of proof-based UDT. Intuitively, we surely *want* a decision theory that reasons as you said, but the question is, can you write down a decision algorithm that *actually* reasons like that?

Most people agree with you on the philosophy of how an idealized decision theory *should* act, but the hard part is formalizing a decision theory that *actually does the right things.* The difficult part isn't in the philosophy, the difficult part is turning the philosophy into math :-)

**so8res**on Why isn't the following decision theory optimal? · 2015-04-17T15:19:27.907Z · score: 2 (2 votes) · LW · GW

Has anyone written specifically on how exactly to give weights to logical connections between similar but non-identical entities?

Nope! That's the open part of the problem :-) We don't know how to build a decision network with logical nodes, and we don't know how to propagate a "logical update" between nodes. (That is, we don't have a good formalism of how changing one algorithm logically affects a related but non-identical algorithm.)

If we had the latter thing, we wouldn't even need the "logical decision network", because we could just ask "if I change the agent, how does that logically affect the universe?" (as both are algorithms); this idea is the basis of proof-based UDT (which tries to answer the problem by searching for proofs under the assumption "Agent()=a" for various actions). Proof based UDT has lots of problems of its own, though, and thinking about logical updates in logical graphs is a fine angle of approach.

**so8res**on Why isn't the following decision theory optimal? · 2015-04-17T01:28:23.050Z · score: 1 (1 votes) · LW · GW

I think defect is the right answer in your AI problem and therefore that NDT gets it right

That's surprising to me. Imagine that the situation is "prisoner's dilemma with shared source code", and that the AIs inspect each other's source code and verify that (by some logical but non-causal miracle) they have exactly identical source code. Do you still think they do better to defect? *I* wouldn't want to build an agent that defects in that situation :-p

The paper that jessicat linked in the parent post is a decent introduction to the notion of logical counterfactuals. See also the "Idealized Decision Theory" section of this annotated bibliography, and perhaps also this short sequence I wrote a while back.

**so8res**on Why isn't the following decision theory optimal? · 2015-04-17T00:18:17.919Z · score: 3 (3 votes) · LW · GW

The universe begins, and then almost immediately, two different alien species make AIs while spacelike separated. The AIs start optimizing their light cones and meet in the middle, and must play a Prisoner's Dilemma.

There is absolutely no causal relationship between them before the PD, so it doesn't matter what precommitments they would have made at the beginning of time :-)

To be clear, this sort of thought experiment is meant to demonstrate why your NDT is not optimal; it's not meant to be a feasible example. The reason we're trying to formalize "logical effect" is *not* specifically so that our AIs can cooperate with independently developed alien AIs or something (although that would be a fine perk). Rather, this extreme example is intended to demonstrate why idealized counterfactual reasoning needs to take logical effects into account. Other thought experiments can be used to show that reasoning about logical effects matters in more realistic scenarios, but first it's important to realize that they matter at all :-)

**so8res**on Why isn't the following decision theory optimal? · 2015-04-16T22:13:49.664Z · score: 4 (4 votes) · LW · GW

In the retro blackmail, CDT does not precommit to refusing *even if* it's given the opportunity to do so before the researcher gets its source code. This is because CDT believes that the researcher is predicting according to a causally disconnected copy of itself, and therefore it does not believe that its actions can affect the copy. (That is, if CDT knows it is going to be retro blackmailed, and considers this before the researcher gets access to its source code, then it still doesn't precommit.) The failure here is that CDT only reasons according to what it can *causally* affect, but in the real world decision algorithms also need to worry about what they can *logically* affect (For example, two agents created while spacelike separated should be able to cooperate on a Prisoner's Dilemma.)

Your attempted patch (pretend you made your precommitments earlier in time) only works when the neglected logical relationships stem from a causal event earlier in time. This is often but not always the case. For instance, if CDT thinks that its clone was *causally* copied from its own source code, then you can get the right answer by acting as CDT would have precommitted to act before the copying occurred. But two agents written in spacelike separation from each other might have decision algorithms that are *logically* correlated, despite there being no *causal* connection no matter how far back you go.

In order to get the right precommitments in those sorts of scenarios, you need to formalize some sort of notion of "things the decision algorithm's choice *logically* affects," and formalizing "logical effects" is basically the part of the problem that remains difficult :-)

**so8res**on Ephemeral correspondence · 2015-04-12T01:47:23.130Z · score: 2 (2 votes) · LW · GW

Thanks! I agree that this isn't the best set-up for getting people interested in instrumental rationality, but remember that these essays were generated as a set of things to say before reading Rationality: AI to Zombies -- they're my take on the unstated background assumptions that motivate R:A-Z a little better before people pick it up. For that reason, the essays have a strong "why epistemics?" bent :-)