lukeprog feed - LessWrong 2.0 Reader

Comment by lukeprog on nikola's Shortform

lukeprog — 2025-01-01T01:56:25.694Z

Are you able to report the median AGI timeline for ~all METR employees? Or are you just saying that the "more than half" is how many responded to the survey question?

Comment by lukeprog on AI #13: Potential Algorithmic Improvements

lukeprog — 2023-05-30T17:44:39.323Z

no one is currently hard at work drafting concrete legislative or regulatory language

I'd like readers to know that fortunately, this hasn't been true for a while now. But yes, such efforts continue to be undersupplied with talent.

Comment by lukeprog on AI #12:The Quest for Sane Regulations

lukeprog — 2023-05-19T13:59:50.856Z

Where is the Arnold Kling quote from?

Comment by lukeprog on Shut Up and Divide?

lukeprog — 2022-09-02T18:48:38.747Z

I haven't read the other comments here and I know this post is >10yrs old, but…

For me, (what I'll now call) effective-altruism-like values are mostly second-order, in the sense that a lot of my revealed behavior shows that a lot of the time I don't want to help strangers, animals, future people, etc. But I think I "want to want to" help strangers, and sometimes the more goal-directed rational side of my brain wins out and I do something to help strangers at personal sacrifice to myself (though I do this less than e.g. Will MacAskill). But I don't really detect in myself a symmetrical second-order want to NOT want to help strangers. So that's one thing that "Shut up and multiply" has over "shut up and divide," at least for me.

That said, I realize now that I'm often guilty of ignoring this second-orderness when e.g. making the case for effective altruism. I will often appeal to my interlocutor's occasional desire to help strangers and suggest they generalize it, but I don't symmetrically appeal to their clearer and more common disinterest in helping strangers and suggest they generalize THAT. To be more honest and accurate while still making the case for EA, I should be appealing to their second-order desires, though of course that's a more complicated conversation.

Comment by lukeprog on Humans are very reliable agents

lukeprog — 2022-06-19T21:48:22.954Z

Somewhat related: Estimating the Brittleness of AI.

Comment by lukeprog on Clem's Memo

lukeprog — 2022-04-17T02:10:43.699Z

Features that make a report especially helpful to me

lukeprog — 2022-04-14T01:12:19.111Z

Cross-post from EA Forum, follow-up to EA needs consultancies.

Below is a list of features that make a report on some research question more helpful to me, along with a list of examples.

I wrote this post for the benefit of individuals and organizations from whom I might commission reports on specific research questions, but others might find it useful as well. Much of what's below is probably true for Open Philanthropy in general, but I've written it in my own voice so that I don't need to try to represent Open Philanthropy as a whole.

For many projects, some of the features below are not applicable, or not feasible, or (most often) not worth the cost, especially time-cost. But if present, these features make a report more helpful and action-informing to me:

The strongest forms of evidence available on the question were generated/collected. This is central but often highly constrained, e.g. we generally can't run randomized trials in geopolitics, and major companies won't share much proprietary data. But before commissioning a report, I'd typically want to know what the strongest evidence that could in theory be collected is, and how much that might cost to gather or produce.
Thoughtful cost-benefit analysis, where relevant.
Strong reasoning transparency throughout, of this particular type. In most cases this might be the most important feature I'm looking for, especially given that many research questions don't lend themselves to more than 1-3 types of evidence anyway, and all of them are weak. In many cases, especially when I don't have much prior context and trust built up with the producers of a report, I would like to pay for a report to be pretty "extreme" about reasoning transparency, e.g. possibly:
- a footnote or endnote indicating what kind of support nearly every substantive claim has, including lengthy blockquotes of the relevant passages from primary sources (as in a GiveWell intervention report^[1]).
- explicit probabilities (from authors, experts, or superforecasters) provided for dozens or hundreds of claims and forecasts throughout the report, to indicate degrees of confidence. (Most people don't have experience giving plausibly-calibrated explicit probabilities for claims, but I'll often be willing to provide funding for explicit probabilities about some of a report's claims to be provided by companies that specialize in doing that, e.g. Good Judgment, Metaculus, or Hypermind.)
- Lots of appendices that lay out more detailed reasoning and evidence for claims that are argued more briefly in the main text of the report, a la my animal consciousness report, which is 83% appendices and endnotes (by word count).
Authors and other major contributors who have undergone special training in calibration and forecasting,^[2] e.g. from Hubbard and Good Judgment. This should help contributors to a report to "speak our language" of calibrated probabilities and general Bayesianism, and perhaps improve the accuracy/calibration of the claims in the report itself. I'm typically happy to pay for this training for people working on a project I've commissioned.
External reviews of the ~final report, including possibly from experts with different relevant specializations and differing/opposed object-level views. This should be fairly straightforward with sufficient honoraria for reviewers, and sufficient time spent identifying appropriate experts.

Some of the strongest examples of ideal reports of this type that I've seen are:

GiveWell's intervention/program reports^[3] and top charity reviews.^[4]
David Roodman's evidence reviews, e.g. on microfinance, alcohol taxes, and the effects of incarceration on crime (most of these were written for Open Philanthropy).

Other examples include:

The most-developed topic articles by Our World in Data, e.g. on COVID-19, population growth, life expectancy, and extreme poverty.
Various Open Philanthropy reports that are more speculative and/or less thorough than e.g. Roodman's reports, for example our "medium-depth" cause reports here, some reports speculating about the future of AI (e.g. 1, 2, 3, 4, 5), and my report speculating about animal consciousness.
A few reports by Rethink Priorities, e.g. on agricultural land redistribution, charter cities, lead exposure, and risks from nuclear weapons.
Some of the systematic reviews by Cochrane and Campbell.

Notes

E.g. see the footnotes at the bottom of this page. ↩︎
Of course, training in statistics, causal inference, critical evidence assessment, and the specific domains relevant to the report are also important, but that training is more difficult to acquire rapidly. ↩︎
Click the "(More)" links in the "Description of program" column here. ↩︎
Click the "Full Research Report" links here. ↩︎

Comment by lukeprog on Ideal governance (for companies, countries and more)

lukeprog — 2022-04-07T13:50:19.330Z

Some other literature OTOH:

Collective Reflective Equilibrium in Practice
Not "ideal," but exploring what's possible: Legal Systems Very Different from Ours
There's a pretty large literature on various forms of "deliberative democracy," e.g. see here and here
I would guess there's been interesting discussions of ideal governance in the context of DAOs

Comment by lukeprog on Epistemic Legibility

lukeprog — 2022-02-09T21:06:48.020Z

Lots of overlap between this concept and what Open Phil calls reasoning transparency.

Comment by lukeprog on List of Probability Calibration Exercises

lukeprog — 2022-01-25T16:48:19.425Z

The Open Philanthropy and 80,000 Hours links are for the same app, just at different URLs.

Comment by lukeprog on Forecasting Newsletter: December 2021

lukeprog — 2022-01-13T20:42:42.157Z

On Foretell moving to ARLIS… There's no way you could've known this, but as it happens Foretell is moving from one Open Phil grantee (CSET) to another (UMD ARLIS). TBC I wasn't involved in the decision for Foretell to make that transition, but it seems fine to me, and Foretell is essentially becoming another part of the project I funded at ARLIS.

Comment by lukeprog on Forecasting Newsletter: December 2021

lukeprog — 2022-01-11T21:02:23.873Z

Someone with a newsletter aimed at people interested in forecasting should let them know. :)

Comment by lukeprog on Forecasting Newsletter: December 2021

lukeprog — 2022-01-10T21:29:06.158Z

$40k feels like a significant quantity of all the funding there is for small experiments in the forecasting space.

Seems like a fit for the EA Infrastructure Fund, no?

Comment by lukeprog on My Overview of the AI Alignment Landscape: Threat Models

lukeprog — 2022-01-09T14:17:51.809Z

Comment by lukeprog on Great Power Conflict

lukeprog — 2021-09-17T19:59:40.840Z

Comment by lukeprog on Multitudinous outside views

lukeprog — 2020-08-18T16:33:30.875Z

Previously: Model Combination and Adjustment.

Comment by lukeprog on Predictions/questions about conquistadors?

lukeprog — 2020-05-29T16:23:26.425Z

Very cool that you posted these quantified predictions in advance!

Comment by lukeprog on Peter's COVID Consolidated Brief - 29 Apr

lukeprog — 2020-04-30T17:22:58.547Z

Nice write-up!

A few thoughts re: Scott Alexander & Rob Wiblin on prediction.

Scott wrote that "On February 20th, Tetlock’s superforecasters predicted only a 3% chance that there would be 200,000+ coronavirus cases a month later (there were)." I just want to note that while this was indeed a very failed prediction, in a sense the supers were wrong by just two days. (WHO-counted cases only reached >200k on March 18th, two days before question close.)
One interesting pre-coronavirus probabilistic forecast of global pandemic odds is this: From 2016 through Jan 1st 2020, Metaculus users made forecasts about whether there would be a large pandemic (≥100M infections or ≥10M deaths in a 12mo period) by 2026. For most of the question’s history, the median forecast was 10%-25%, and the special Metaculus aggregated forecast was around 35%. At first this sounded high to me, but then someone pointed out that 4 pandemics from the previous 100 years qualified (I didn't double-check this), suggesting a base rate of 40% chance per decade. So the median and aggregated forecasts on Metaculus were actually lower than the naive base rate (maybe by accident, or maybe forecasters adjusted downward because we have better surveillance and mitigation tools today?), but I'm guessing still higher than the probabilities that would've been given by most policymakers and journalists if they were in the habit of making quantified falsifiable forecasts. Moreover, using the Tetlockian strategy of just predicting the naive base rate with minimal adjustment would've yielded an even more impressive in-advance prediction of the coronavirus pandemic.
More generally, the research on probabilistic forecasting makes me suspect that prediction polls/markets with highly-selected participants (e.g. via GJI or HyperMind), or perhaps even those without highly-selected participants (e.g. via GJO or Metaculus), could achieve pretty good calibration (though not necessarily resolution) on high-stakes questions (e.g. about low-probability global risks) with 2-10 year time horizons, though this has not yet been checked.

Comment by lukeprog on Cortés, Pizarro, and Afonso as Precedents for Takeover

lukeprog — 2020-03-02T05:43:30.154Z

Nice post. Were there any sources besides Wikipedia that you found especially helpful when researching this post?

Comment by lukeprog on In Defense of the Arms Races… that End Arms Races

lukeprog — 2020-01-16T00:08:56.132Z

If the U.S. kept racing in its military capacity after WW2, the U.S. may have been able to use its negotiating leverage to stop the Soviet Union from becoming a nuclear power: halting proliferation and preventing the build up of world threatening numbers of high yield weapons.

BTW, the most thorough published examination I've seen of whether the U.S. could've done this is Quester (2000). I've been digging into the question in more detail and I'm still not sure whether it's true or not (but "may" seems reasonable).

Comment by lukeprog on How common is it for one entity to have a 3+ year technological lead on its nearest competitor?

lukeprog — 2019-12-03T01:59:28.379Z

I'm very interested in this question, thanks for looking into it!

Comment by lukeprog on If you had to pick one thing you've read that changed the course of your life, what would it be?

lukeprog — 2019-09-15T00:59:26.455Z

My answer from 2017 is here.

Comment by lukeprog on Preliminary thoughts on moral weight

lukeprog — 2019-07-31T21:37:22.822Z

Interesting historical footnote from Louis Francini:

This issue of differing "capacities for happiness" was discussed by the classical utilitarian Francis Edgeworth in his 1881 Mathematical Psychics (pp 57-58, and especially 130-131). He doesn't go into much detail at all, but this is the earliest discussion of which I am aware. Well, there's also the Bentham-Mill debate about higher and lower pleasures ("It is better to be a human being dissatisfied than a pig satisfied"), but I think that may be a slightly different issue.

Comment by lukeprog on Which scientific discovery was most ahead of its time?

lukeprog — 2019-05-16T14:42:59.024Z

Cases where scientific knowledge was in fact lost and then rediscovered provide especially strong evidence about the discovery counterfactauls, e.g. Hero's eolipile and al-Kindi's development of relative frequency analysis for decoding messages. Probably we underestimate how common such cases are, because the knowledge of the lost discovery is itself lost — e.g. we might easily have simply not rediscovered the Antikythera mechanism.

Comment by lukeprog on Preliminary thoughts on moral weight

lukeprog — 2018-10-24T19:38:04.818Z

Apparently Shelly Kagan has a book coming out soon that is (sort of?) about moral weight.

Comment by lukeprog on A Proper Scoring Rule for Confidence Intervals

lukeprog — 2018-08-29T17:48:22.643Z

This scoring rules has some downsides from a usability standpoint. See Greenberg 2018, a whitepaper prepared as background material for a (forthcoming) calibration training app.

Comment by lukeprog on Preliminary thoughts on moral weight

lukeprog — 2018-08-16T02:33:24.309Z

Some other people at Open Phil have spent more time thinking about two-envelope effects more than I have, and fwiw some of their thinking on the issue is in this post (e.g. see section 1.1.1.1).

Comment by lukeprog on Preliminary thoughts on moral weight

lukeprog — 2018-08-14T19:03:58.337Z

My own take on this is described briefly here, with more detail in various appendices, e.g. here.

Comment by lukeprog on Preliminary thoughts on moral weight

lukeprog — 2018-08-14T19:02:27.195Z

Yes, I meant to be describing ranges conditional on each species being moral patients at all. I previously gave my own (very made-up) probabilities for that here. Another worry to consider, though, is that many biological/cognitive and behavioral features of a species are simultaneously (1) evidence about their likelihood of moral patienthood (via consciousness), and (2) evidence about features that might affect their moral weight *given* consciousness/patienthood. So, depending on how you use that evidence, it's important to watch out for double-counting.

I'll skip responding to #2 for now.

Comment by lukeprog on Preliminary thoughts on moral weight

lukeprog — 2018-08-14T18:57:48.389Z

For anyone who is curious, I cite much of the literature arguing over criteria for moral patienthood/weight in the footnotes of this section of my original moral patienthood report. My brief comments on why I've focused on consciousness thus far are here.

Preliminary thoughts on moral weight

lukeprog — 2018-08-13T23:45:13.430Z

This post adapts some internal notes I wrote for the Open Philanthropy Project, but they are merely at a "brainstorming" stage, and do not express my "endorsed" views nor the views of the Open Philanthropy Project. This post is also written quickly and not polished or well-explained.

My 2017 Report on Consciousness and Moral Patienthood tried to address the question of "Which creatures are moral patients?" but it did little to address the question of "moral weight," i.e. how to weigh the interests of different kinds of moral patients against each other:

For example: suppose we conclude that fishes, pigs, and humans are all moral patients, and we estimate that, for a fixed amount of money, we can (in expectation) dramatically improve the welfare of (a) 10,000 rainbow trout, (b) 1,000 pigs, or (c) 100 adult humans. In that situation, how should we compare the different options? This depends (among other things) on how much “moral weight” we give to the well-being of different kinds of moral patients.

Thus far, philosophers have said very little about moral weight (see below). In this post I lay out one approach to thinking about the question, in the hope that others might build on it or show it to be misguided.

Proposed setup

For the simplicity of a first-pass analysis of moral weight, let's assume a variation on classical utilitarianism according to which the only thing that morally matters is the moment-by-moment character of a being's conscious experience. So e.g. it doesn't matter whether a being's rights are respected/violated or its preferences are realized/thwarted, except insofar as those factors affect the moment-by-moment character of the being's conscious experience, by causing pain/pleasure, happiness/sadness, etc.

Next, and again for simplicity's sake, let's talk only about the "typical" conscious experience of "typical" members of different species when undergoing various "canonical" positive and negative experiences, e.g. consuming species-appropriate food or having a nociceptor-dense section of skin damaged.

Given those assumptions, when we talk about the relative "moral weight" of different species, we mean to ask something like "How morally important is 10 seconds of a typical human's experience of [some injury], compared to 10 seconds of a typical rainbow trout's experience of [that same injury]?

For this exercise, I'll separate "moral weight" from "probability of moral patienthood." Naively, you could then multiply your best estimate of a species' moral weight (using humans as the baseline of 1) by P(moral patienthood) to get the species' "expected moral weight" (or whatever you want to call it). Then, to estimate an intervention's potential benefit for a given species, you could multiply [expected moral weight of species] × [individuals of species affected] × [average # of minutes of conscious experience affected across those individuals] × [average magnitude of positive impact on those minutes of conscious experience].

However, I say "naively" because this doesn't actually work, due to two-envelope effects.

Potential dimensions of moral weight

What features of a creature's conscious experience might be relevant to the moral weight of its experiences? Below, I describe some possibilities that I previously mentioned in Appendix Z7 of my moral patienthood report.

Note that any of the features below could be (and in some cases, very likely are) hugely multidimensional. For simplicity, I'm going to assume a unidimensional characterization of them, e.g. what we'd get if we looked only at the principal component in a principal component analysis of a hugely multidimensional phenomenon.

Clock speed of consciousness

Perhaps animals vary in their "clock speed." E.g. a hummingbird reacts to some things much faster than I ever could. If any of that is under conscious control, its "clock speed" of conscious experience seems like it should be faster than mine, meaning that, intuitively, it should have a greater number of subjective "moments of consciousness" per objective minute than I do.

In general, smaller animals probably have faster clock speeds than larger ones, for mechanical reasons:

The natural oscillation periods of most consciously controllable human body parts are greater than a tenth of a second. Because of this, the human brain has been designed with a matching reaction time of roughly a tenth of a second. As it costs more to have faster reaction times, there is little point in paying to react much faster than body parts can change position.

…the first resonant period of a bending cantilever, that is, a stick fixed at one end, is proportional to its length, at least if the stick’s thickness scales with its length. For example, sticks twice as long take twice as much time to complete each oscillation. Body size and reaction time are predictably related for animals today… (Hanson 2016, ch. 6)

My impression is that it's a common intuition to value experience by its "subjective" duration rather than its "objective" duration, with no discount. So if a hummingbird's clock speed is 3x as fast as mine, then all else equal, an objective minute of its conscious pleasure would be worth 3x an objective minute of my conscious pleasure.

Unities of consciousness

Philosophers and cognitive scientists debate how "unified" consciousness is, in various ways. Our normal conscious experience seems to many people to be pretty "unified" in various ways, though sometimes it feels less unified, for example when one goes "in and out of consciousness" during a restless night's sleep, or when one engages in certain kinds of meditative practices.

Daniel Dennett suggests that animal conscious experience is radically less unified than human consciousness is, and cites this as a major reason he doesn't give most animals much moral weight.

For convenience, I'll use Bayne (2010)'s taxonomy of types of unity. He talks about subject unity, representational unity, and phenomenal unity — each of which has a "synchronic" (momentary) and "diachronic" (across time) aspect of unity.

Subject unity

Bayne explains:

My conscious states possess a certain kind of unity insofar as they are all mine; likewise, your conscious states possess that same kind of unity insofar as they are all yours. We can describe conscious states that are had by or belong to the same subject of experience as subject unified. Within subject unity we need to distinguish the unity provided by the subject of experience across time (diachronic unity) from that provided by the subject at a time (synchronic unity).

Representational unity

Bayne explains:

Let us say that conscious states are representationally unified to the degree that their contents are integrated with each other. Representational unity comes in a variety of forms. A particularly important form of representational unity concerns the integration of the contents of consciousness around perceptual objects—what we might call ‘object unity’. Perceptual features are not normally represented by isolated states of consciousness but are bound together in the form of integrated perceptual objects. This process is known as feature-binding. Feature-binding occurs not only within modalities but also between them, for we enjoy multimodal representations of perceptual objects.

I suspect many people wouldn't treat representational unity as all that relevant to moral weight. E.g. there are humans with low representational unity of a sort (e.g. visual agnosics); are their sensory experiences less morally relevant as a result?

Phenomenal unity

Bayne explains:

Subject unity and representational unity capture important aspects of the unity of consciousness, but they don’t get to the heart of the matter. Consider again what it’s like to hear a rumba playing on the stereo whilst seeing a bartender mix a mojito. These two experiences might be subject unified insofar as they are both yours. They might also be representationally unified, for one might hear the rumba as coming from behind the bartender. But over and above these unities is a deeper and more primitive unity: the fact that these two experiences possess a conjoint experiential character. There is something it is like to hear the rumba, there is something it is like to see the bartender work, and there is something it is like to hear the rumba while seeing the bartender work. Any description of one’s overall state of consciousness that omitted the fact that these experiences are had together as components, parts, or elements of a single conscious state would be incomplete. Let us call this kind of unity — sometimes dubbed ‘co-consciousness’ — phenomenal unity.

Phenomenal unity is often in the background in discussions of the ‘stream’ or ‘field’ of consciousness. The stream metaphor is perhaps most naturally associated with the flow of consciousness — its unity through time — whereas the field metaphor more accurately captures the structure of consciousness at a time. We can say that what it is for a pair of experiences to occur within a single phenomenal field just is for them to enjoy a conjoint phenomenality — for there to be something it is like for the subject in question not only to have both experiences but to have them together. By contrast, simultaneous experiences that occur within distinct phenomenal fields do not share a conjoint phenomenal character.

Unity-independent intensity of valenced aspects of consciousness

A common report of those who take psychedelics is that, while "tripping," their conscious experiences are "more intense" than they normally are. Similarly, different pains feel similar but have different intensities, e.g. when my stomach is upset, the intensity of my stomach pain waxes and wanes a fair bit, until it gradually fades to not being noticeable anymore. Same goes for conscious pleasures.

It's possible such variations in intensity are entirely accounted for by their degrees of different kinds of unity, or by some other plausible feature(s) of moral weight, but maybe not. If there is some additional "intensity" variable for valenced aspects of conscious experience, it would seem a good candidate for affecting moral weight.

From my own experience, my guess is that I would endure ~10 seconds of the most intense pain I've ever experienced to avoid experiencing ~2 months of the lowest level of discomfort that I'd bother to call "discomfort." That very low level of discomfort might suggest a lower bound on "intensity of valenced aspects of experience" that I intuitively morally care about, but "the most intense pain I've ever experienced" probably is not the highest intensity of valenced aspects of experience it is possible to experience — probably not even close. You could consider similar trades to get a sense for how much you intuitively value "intensity of experience," at least in your own case.

Moral weights of various species

(This section edited slightly on 2020-02-26.)

If we thought about all this more carefully and collected as much relevant empirical data as possible, what moral weights might we assign to different species?

Whereas my probabilities of moral patienthood for any animal as complex as a crab only range from 0.2 - 1, the plausible ranges of moral weight seem like they could be much larger. I don't feel like I'd be surprised if an omniscient being told me that my extrapolated values would assign pigs more moral weight than humans, and I don't feel like I'd be surprised if an omniscient being told me my extrapolated values would assign pigs .0001 moral weight (assuming they were moral patients at all).

To illustrate how this might work, below are some guesses at some "plausible ranges of moral weight" (80% prediction interval) for a variety of species that someone might come to, if they had intuitions like those explained below.

Humans: 1 (baseline)
Chimpanzees: 0.001 - 2
Pigs: 0.0005 - 3.5
Cows: 0.0001 - 5
Chickens: 0.00005 - 10
Rainbow trout: 0.00001 - 13
Fruit fly: 0.000001 - 20

(But whenever you're tempted to multiply such numbers by something, remember two-envelope effects!)

What intuitions might lead to something like these ranges?

An intuition to not place much value on "complex/higher-order" dimensions of moral weight — such as "fullness of self-awareness" or "capacity for reflecting on one's holistic life satisfaction" — above and beyond the subjective duration and "intensity" of relatively "brute" pleasure/pain/happiness/sadness that (in humans) tends to accompany reflection, self-awareness, etc.
An intuition to care more about subject unity and phenomenal unity than about such higher-order dimensions of moral weight.
An intuition to care most of all about clock speed and experience intensity (if intensity is distinct from unity).
Intuitions that if the animal species listed above are conscious, they:

have very little of the higher-order dimensions of conscious experience,
have faster clock speeds than humans (the smaller the faster),
probably have lower "intensity" of experience, but might actually have somewhat greater intensity of experience (e.g. because they aren't distracted by linguistic thought),
have moderately less subject unity and phenomenal unity, especially of the diachronic sort.

Under these intuitions, the low end of the ranges above could be explained by the possibility that intensity of conscious experience diminishes dramatically with brain complexity and flexibility, while the high end of the ranges above could be explained by the possibility concerning faster clock speeds for smaller animals, the possibility of lesser unity in non-human animals (which one might value at >1x for the same reason one might value a dually-conscious split-brain patient at ~2x), and the possibility for greater intensity of experience in simpler animals.

Other writings on moral weight

Comment by lukeprog on Announcement: AI alignment prize winners and next round

lukeprog — 2018-01-15T21:42:23.386Z

Cool, this looks better than I'd been expecting. Thanks for doing this! Looking forward to next round.

Quick thoughts on empathic metaethics

lukeprog — 2017-12-12T21:46:08.834Z

Years ago, I wrote an unfinished sequence of posts called "No-Nonsense Metaethics." My last post, Pluralistic Moral Reductionism, said I would next explore "empathic metaethics," but I never got around to writing those posts. Recently, I wrote a high-level summary of some initial thoughts on "empathic metaethics" in section 6.1.2 of a report prepared for my employer, the Open Philanthropy Project. With my employer's permission, I've adapted that section for publication here, so that it can serve as the long-overdue concluding post in my sequence on metaethics.

In my previous post, I distinguished "austere metaethics" and "empathic metaethics," where austere metaethics confronts moral questions roughly like this:

Tell me what you mean by 'right', and I will tell you what is the right thing to do. If by 'right' you mean X, then Y is the right thing to do. If by 'right' you mean P, then Z is the right thing to do. But if you can't tell me what you mean by 'right', then you have failed to ask a coherent question, and no one can answer an incoherent question.

Meanwhile, empathic metaethics says instead:

You may not know what you mean by 'right.' But let's not stop there. Here, let me come alongside you and help decode the cognitive algorithms that generated your question in the first place, and then we'll be able to answer your question. Then we can tell you what the right thing to do is.

Below, I provide a high-level summary of some of my initial thoughts on what one approach to "empathic metaethics" could look like.

Given my metaethical approach, when I make a “moral judgment” about something (e.g. about which kinds of beings are moral patients), I don’t conceive of myself as perceiving an objective moral truth, or coming to know an objective moral truth via a series of arguments. Nor do I conceive of myself as merely expressing my moral feelings as they stand today. Rather, I conceive of myself as making a conditional forecast about what my values would be if I underwent a certain “idealization” or “extrapolation” procedure (coming to know more true facts, having more time to consider moral arguments, etc.).[1]

Thus, in a (hypothetical) "extreme effort" attempt to engage in empathic metaethics (for thinking about my own moral judgments), I would do something like the following:

I would try to make the scenario I'm aiming to forecast as concrete as possible, so that my brain is able to treat it as a genuine forecasting challenge, akin to participating in a prediction market or forecasting tournament, rather than as a fantasy about which my brain feels "allowed" to make up whatever story feels nice, or signals my values to others, or achieves something else that isn't forecasting accuracy.[2] In my case, I concretize the extrapolation procedure as one involving a large population of copies of me who learn many true facts, consider many moral arguments, and undergo various other experiences, and then collectively advise me about what I should value and why.[3]
However, I would also try to make forecasts I can actually check for accuracy, e.g. about what my moral judgment about various cases will be 2 months in the future.
When making these forecasts, I would try to draw on the best research I've seen concerning how to make accurate estimates and forecasts. For example I would try to "think like a fox, not like a hedgehog," and I've already done several hours of probability calibration training, and some amount of forecasting training.[4]
Clearly, my current moral intuitions would serve as one important source of evidence about what my extrapolated values might be. However, recent findings in moral psychology and related fields lead me to assign more evidential weight to some moral intuitions than to others. More generally, I interpret my current moral intuitions as data generated partly by my moral principles, and partly by various "error processes" (e.g. a hard-wired disgust reaction to spiders, which I don't endorse upon reflection). Doing so allows me to make use of some standard lessons from statistical curve-fitting when thinking about how much evidential weight to assign to particular moral intuitions.[5]
As part of forecasting what my extrapolated values might be, I would try to consider different processes and contexts that could generate alternate moral intuitions in moral reasoners both similar and dissimilar to my current self, and I would try to consider how I feel about the the "legitimacy" of those mechanisms as producers of moral intuitions. For example I might ask myself questions such as "How might I feel about that practice if I was born into a world in which it was already commonplace?" and "How might I feel about that case if my built-in (and largely unconscious) processes for associative learning and imitative learning had been exposed to different life histories than my own?" and "How might I feel about that case if I had been born in a different century, or a different country, or with a greater propensity for clinical depression?" and "How might a moral reasoner on another planet feel about that case if it belonged to a more strongly r-selected species (compared to humans) but had roughly human-like general reasoning ability?"[6]
Observable patterns in how people's values change (seemingly) in response to components of my proposed extrapolation procedure (learning more facts, considering moral arguments, etc.) would serve as another source of evidence about what my extrapolated values might be. For example, the correlation between aggregate human knowledge and our "expanding circle of moral concern" (Singer 2011) might (very weakly) suggest that, if I continued to learn more true facts, my circle of moral concern would continue to expand. Unfortunately, such correlations are badly confounded, and might not provide much evidence.[7]
Personal facts about how my own values have evolved as I've learned more, considered moral arguments, and so on, would serve as yet another source of evidence about what my extrapolated values might be. Of course, these relations are likely confounded as well, and need to be interpreted with care.[8]

1. This general approach sometimes goes by names such as "ideal advisor theory" or, arguably, "reflective equilibrium." Diverse sources explicating various extrapolation procedures (or fragments of extrapolation procedures) include: Rosati (1995); Daniels (2016); Campbell (2013); chapter 9 of Miller (2013); Muehlhauser & Williamson (2013); Trout (2014); Yudkowsky's "Extrapolated volition (normative moral theory)" (2016); Baker (2016); Stanovich (2004), pp. 224-275; Stanovich (2013).

2. For more on forecasting accuracy, see this blog post. My use of research on the psychological predictors of forecasting accuracy for the purposes of doing moral philosophy is one example of my support for the use of "ameliorative psychology" in philosophical practice — see e.g. Bishop & Trout (2004, 2008).

3. Specifically, the scenario I try to imagine (and make conditional forecasts about) looks something like this:

In the distant future, I am non-destructively "uploaded." In other words, my brain and some supporting cells are scanned (non-destructively) at a fine enough spatial and chemical resolution that, when this scan is combined with accurate models of how different cell types carry out their information-processing functions, one can create an executable computer model of my brain that matches my biological brain's input-output behavior almost exactly. This whole brain emulation ("em") is then connected to a virtual world: computed inputs are fed to the em's (now virtual) signal transduction neurons for sight, sound, etc., and computed outputs from the em's virtual arm movements, speech, etc. are received by the virtual world, which computes appropriate changes to the virtual world in response. (I don't think anything remotely like this will ever happen, but as far as I know it is a physically possible world that can be described in some detail; for one attempt, see Hanson 2016.) Given functionalism, this "em" has the same memories, personality, and conscious experience that I have, though it experiences quite a shock when it awakens to a virtual world that might look and feel somewhat different from the "real" world.
This initial em is copied thousands of times. Some of the copies interact inside the same virtual world, other copies are placed inside isolated virtual worlds.
Then, these ems spend a very long time (a) collecting and generating arguments and evidence about morality and related topics, (b) undergoing various experiences, in varying orders, and reflecting on those experiences, (c) dialoguing with ems sourced from other biological humans who have different values than I do, and perhaps with sophisticated chat-bots meant to simulate the plausible reasoning of other types of people (from the past, or from other worlds) who were not available to be uploaded, and so on. They are able to do these things for a very long time because they and their virtual worlds are run at speeds thousands of times faster than my biological brain runs, allowing subjective eons to pass in mere months of "objective" time.
Finally, at some time, the ems dialogue with each other about which values seem "best," they engage in moral trade (Ord 2015), and they try to explain to me what values they think I should have and why. In the end, I am not forced to accept any of the values they then hold (collectively or individually), but I am able to come to much better-informed moral judgments than I could have without their input.

For more context on this sort of values extrapolation procedure, see Muehlhauser & Williamson (2013).

4. For more on forecasting "best practices," see this blog post.

5. Following Hanson (2002) and ch. 2 of Beckstead (2013), I consider my moral intuitions in the context of Bayesian curve-fitting. To explain, I'll quote Beckstead (2013) at some length:

Curve fitting is a problem frequently discussed in the philosophy of science. In the standard presentation, a scientist is given some data points, usually with an independent variable and a dependent variable, and is asked to predict the values of the dependent variable given other values of the independent variable. Typically, the data points are observations, such as "measured height" on a scale or "reported income" on a survey, rather than true values, such as height or income. Thus, in making predictions about additional data points, the scientist has to account for the possibility of error in the observations. By an error process I mean anything that makes the observed values of the data points differ from their true values. Error processes could arise from a faulty scale, failures of memory on the part of survey participants, bias on the part of the experimenter, or any number of other sources. While some treatments of this problem focus on predicting observations (such as measured height), I'm going to focus on predicting the true values (such as true height).

…For any consistent data set, it is possible to construct a curve that fits the data exactly… If the scientist chooses one of these polynomial curves for predictive purposes, the result will usually be overfitting, and the scientist will make worse predictions than he would have if he had chosen a curve that did not fit the data as well, but had other virtues, such as a straight line. On the other hand, always going with the simplest curve and giving no weight to the data leads to underfitting…

I intend to carry over our thinking about curve fitting in science to reflective equilibrium in moral philosophy, so I should note immediately that curve fitting is not limited to the case of two variables. When we must understand relationships between multiple variables, we can turn to multiple-dimensional spaces and fit planes (or hyperplanes) to our data points. Different axes might correspond to different considerations which seem relevant (such as total well-being, equality, number of people, fairness, etc.), and another axis could correspond to the value of the alternative, which we can assume is a function of the relevant considerations. Direct Bayesian updating on such data points would be impractical, but the philosophical issues will not be affected by these difficulties.

…On a Bayesian approach to this problem, the scientist would consider a number of different hypotheses about the relationship between the two variables, including both hypotheses about the phenomena (the relationship between X and Y) and hypotheses about the error process (the relationship between observed values of Y and true values of Y) that produces the observations…

…Lessons from the Bayesian approach to curve fitting apply to moral philosophy. Our moral intuitions are the data, and there are error processes that make our moral intuitions deviate from the truth. The complete moral theories under consideration are the hypotheses about the phenomena. (Here, I use "theory" broadly to include any complete set of possibilities about the moral truth. My use of the word "theory" does not assume that the truth about morality is simple, systematic, and neat rather than complex, circumstantial, and messy.) If we expect the error processes to be widespread and significant, we must rely on our priors more. If we expect the error processes to be, in addition, biased and correlated, then we will have to rely significantly on our priors even when we have a lot of intuitive data.

Beckstead then summarizes the framework with a table (p. 32), edited to fit into LessWrong's formatting:

Hypotheses about phenomena

(Science) Different trajectories of a ball that has been dropped
(Moral Philosophy) Moral theories (specific versions of utilitarianism, Kantianism, contractualism, pluralistic deontology, etc.)

Hypotheses about error processes

(Science) Our position measurements are accurate on average, and are within 1 inch 95% of the time (with normally distributed error)
(Moral Philosophy) Different hypotheses about the causes of error in historical cases; cognitive and moral biases; different hypotheses about the biases that cause inconsistent judgments in important philosophical cases

Observations

(Science) Recorded position of a ball at different times recorded with a certain clock
(Moral Philosophy) Intuitions about particular cases or general principles, and any other relevant observations

Background theory

(Science) The ball never bounces higher than the height it started at. The ball always moves along a continuous trajectory.
(Moral Philosophy) Meta-ethical or normative background theory (or theories)

6. For more on this, see my conversation with Carl Shulman, O'Neill (2015), the literature on the evolution of moral values (e.g. de Waal et al. 2014; Sinnott-Armstrong & Miller 2007; Joyce 2005), the literature on moral psychology more generally (e.g. Graham et al. 2013; Doris 2010; Liao 2016; Christen et al. 2014; Sunstein 2005), the literature on how moral values vary between cultures and eras (e.g. see Flanagan 2016; Inglehart & Welzel 2010; Pinker 2011; Morris 2015; Friedman 2005; Prinz 2007, pp. 187-195), and the literature on moral thought experiments (e.g. Tittle 2004, ch. 7). See also Wilson (2016)'s comments on internal and external validity in ethical thought experiments, and Bakker (2017) on "alien philosophy."

I do not read much fiction, but I suspect that some types of fiction — e.g. historical fiction, fantasy, and science fiction — can help readers to temporarily transport themselves into fully-realized alternate realities, in which readers can test how their moral intuitions differ when they are temporarily "lost" in an alternate world.

7. There are many sources which discuss how people's values seem to change along with (and perhaps in response to) components of my proposed extrapolation procedure, such as learning more facts, reasoning through more moral arguments, and dialoguing with others who have different values. See e.g. Inglehart & Welzel (2010), Pinker (2011), Shermer (2015), and Buchanan & Powell (2016). See also the literatures on "enlightened preferences" (Althaus 2003, chs. 4-6) and on "deliberative polling."

8. For example, as I've learned more, considered more moral arguments, and dialogued more with people who don't share my values, my moral values have become more "secular-rational" and "self-expressive" (Inglehart & Welzel 2010), more geographically global, more extensive (e.g. throughout more of the animal kingdom), less person-affecting, and subject to greater moral uncertainty (Bykvist 2017).

Comment by lukeprog on Oxford Prioritisation Project Review

lukeprog — 2017-10-14T00:12:05.575Z

Hurrah failed project reports!

Comment by lukeprog on Ten small life improvements

lukeprog — 2017-08-24T15:55:33.787Z

One of my most-used tools is very simple: an Alfred snippet that lets me paste-as-plain-text using Cmd+Opt+V.

Comment by lukeprog on Rescuing the Extropy Magazine archives

lukeprog — 2017-07-01T21:36:21.850Z

Thanks!

Comment by lukeprog on LessWrong 2.0 Feature Roadmap & Feature Suggestions

lukeprog — 2017-07-01T06:43:09.823Z

From a user's profile, be able to see their comments in addition to their posts.

Dunno about others, but this is actually one of the LW features I use the most.

(Apologies if this is listed somewhere already and I missed it.)

Comment by lukeprog on LessWrong 2.0 Feature Roadmap & Feature Suggestions

lukeprog — 2017-06-23T23:37:09.865Z

Probably not suitable for launch, but given that the epistemic seriousness of the users is the most important "feature" for me and some other people I've spoken to, I wonder if some kind of "user badges" thing might be helpful, especially if it influences the weight that upvotes and downvotes from those users have. E.g. one badge could be "has read >60% of the sequences, as 'verified' by one of the 150 people the LW admins trust to verify such a thing about someone" and "verified superforecaster" and probably some other things I'm not immediately thinking of.

Comment by lukeprog on Book recommendation requests

lukeprog — 2017-06-03T23:32:00.328Z

Constantly.
Frequently.

Comment by lukeprog on Book recommendation requests

lukeprog — 2017-06-02T19:40:42.748Z

Best Textbooks on Every Subject

Comment by lukeprog on AGI and Mainstream Culture

lukeprog — 2017-05-23T19:28:54.174Z

Thanks for briefly describing those Doctor Who episodes.

Comment by lukeprog on The Best Textbooks on Every Subject

lukeprog — 2017-03-07T22:24:42.993Z

Lists of textbook award winners like this list might also be useful.

Comment by lukeprog on The Best Textbooks on Every Subject

lukeprog — 2017-02-25T21:44:11.720Z

Fixed, thanks.

Comment by lukeprog on Can the Chain Still Hold You?

lukeprog — 2017-01-27T14:41:03.746Z

Today I encountered a real-life account of a the chain story — involving a cow rather than an elephant — around 24:10 into the "Best of BackStory, Vol. 1" episode of the podcast BackStory.

Comment by lukeprog on CFAR’s new focus, and AI Safety

lukeprog — 2016-12-07T18:06:38.611Z

"Accuracy-boosting" or "raising accuracy"?

Comment by lukeprog on Paid research assistant position focusing on artificial intelligence and existential risk

lukeprog — 2016-05-03T18:28:24.969Z

Source. But the non-cached page says "The details of this job cannot be viewed at this time," so maybe the job opening is no longer available.

FWIW, I'm a bit familiar with Dafoe's thinking on the issues, and I think it would be a good use of time for the right person to work with him.

Comment by lukeprog on Audio version of Rationality: From AI to Zombies out of beta

lukeprog — 2016-04-22T14:44:38.240Z

Hi Rick, any updates on the Audible version?

Comment by lukeprog on [link] Simplifying the environment: a new convergent instrumental goal

lukeprog — 2016-04-22T14:43:10.592Z

Comment by lukeprog on Why CFAR? The view from 2015

lukeprog — 2015-12-20T19:57:54.441Z

Just donated!

Comment by lukeprog on Audio version of Rationality: From AI to Zombies out of beta

lukeprog — 2015-12-01T04:51:13.682Z

Hurray!

Comment by lukeprog on Audio version of Rationality: From AI to Zombies out of beta

lukeprog — 2015-11-27T15:24:23.787Z

Any chance you'll eventually get this up on Audible? I suspect that in the long run, it can find a wider audience there.

Comment by lukeprog on The Best Textbooks on Every Subject

lukeprog — 2015-10-03T05:52:52.571Z

Another attempt to do something like this thread: Viva la Books.

Comment by lukeprog on Estimate Stability

lukeprog — 2015-08-20T17:30:53.991Z

I guess subjective logic is also trying to handle this kind of thing. From Jøsang's book draft:

Subjective logic is a type of probabilistic logic that allows probability values to be expressed with degrees of uncertainty. The idea of probabilistic logic is to combine the strengths of logic and probability calculus, meaning that it has binary logic’s capacity to express structured argument models, and it has the power of probabilities to express degrees of truth of those arguments. The idea of subjective logic is to extend probabilistic logic by also expressing uncertainty about the probability values themselves, meaning that it is possible to reason with argument models in presence of uncertain or incomplete evidence.

Though maybe this particular formal system has really undesirable properties, I don't know.

MIRI's 2014 Summer Matching Challenge

lukeprog — 2014-08-07T20:03:24.171Z

(Cross-posted from MIRI's blog. MIRI maintains Less Wrong, with generous help from Trike Apps, and much of the core content is written by salaried MIRI staff members.)

Thanks to the generosity of several major donors,^† every donation made to MIRI between now and August 15th, 2014 will be matched dollar-for-dollar, up to a total of $200,000!

Now is your chance to double your impact while helping us raise up to $400,000 (with matching) to fund our research program.

Corporate matching and monthly giving pledges will count towards the total! Please email malo@intelligence.org if you intend on leveraging corporate matching (check here, to see if your employer will match your donation) or would like to pledge 6 months of monthly donations, so that we can properly account for your contributions towards the fundraiser.

(If you're unfamiliar with our mission, see: Why MIRI?)

Donate Now

Accomplishments Since Our Winter 2013 Fundraiser Launched:

Hired 2 new Friendly AI researchers, Benja Fallenstein & Nate Soares. Since March, they've authored or co-authored 4 papers/reports, with several others in the works. Right now they're traveling, to present papers at the Vienna Summer of Logic, AAAI-14, and AGI-14.
5 new papers & book chapters: “Why We Need Friendly AI,” “The errors, insights, and lessons of famous AI predictions,” “Problems of self-reference...,” “Program equilibrium...,” and “The ethics of artificial intelligence.”
11 new technical reports: 7 reports from the December 2013 workshop, “Botworld,” “Loudness...,” “Distributions allowing tiling...,” and “Non-omniscience...”
New book: Smarter Than Us, published both as an e-book and a paperback.
Held one MIRI workshop and launched the MIRIx program, which currently supports 8 independently-organized Friendly AI discussion/research groups around the world.
New analyses: Robby's posts on naturalized induction, Luke's list of 70+ studies which could improve our picture of superintelligence strategy, “Exponential and non-exponential trends in information technology,” “The world's distribution of computation,” “How big is the field of artificial intelligence?,” “Robust cooperation: A case study in Friendly AI research,” “Is my view contrarian?,” and “Can we really upload Johnny Depp's brain?”
Won $60,000+ in matching and prizes from sources that wouldn't have otherwise given to MIRI, via the Silicon Valley Gives fundraiser. (Thanks again, all you dedicated donors!)
49 new expert interviews, including interviews with Scott Aaronson (MIT), Max Tegmark (MIT), Kathleen Fisher (DARPA), Suresh Jagannathan (DARPA), André Platzer (CMU), Anil Nerode (Cornell), John Baez (UC Riverside), Jonathan Millen (MITRE), and Roger Schell.
4 transcribed conversations about MIRI strategy: 1, 2, 3, 4.
Published a thorough “2013 in review.”

Ongoing Activities You Can Help Support

We're writing an overview of the Friendly AI technical agenda (as we see it) so far.
We're currently developing and testing several tutorials on different pieces of the Friendly AI technical agenda (tiling agents, modal agents, etc.).
We're writing several more papers and reports.
We're growing the MIRIx program, largely to grow the pool of people we can plausibly hire as full-time FAI researchers in the next couple years.
We're planning, or helping to plan, multiple research workshops, including the May 2015 decision theory workshop at Cambridge University.
We're finishing the editing for a book version of Eliezer's Sequences.
We're helping to fund further SPARC activity, which provides education and skill-building to elite young math talent, and introduces them to ideas like effective altruism and global catastrophic risks.
We're continuing to discuss formal collaboration opportunities with UC Berkeley faculty and development staff.
We're helping Nick Bostrom promote his Superintelligence book in the U.S.
We're investigating opportunities for supporting Friendly AI research via federal funding sources such as the NSF.

Other projects are still being surveyed for likely cost and impact. See also our mid-2014 strategic plan. We appreciate your support for our work!

Donate now, and seize a better than usual opportunity to move our work forward. If you have questions about donating, please contact Malo Bourgon at malo@intelligence.org.

^† $200,000 of total matching funds has been provided by Jaan Tallinn, Edwin Evans, and Rick Schwall.

Screenshot service provided by URL2PNG.com used to include self updating progress bar.

Will AGI surprise the world?

lukeprog — 2014-06-21T22:27:31.620Z

Cross-posted from my blog.

Yudkowsky writes:

In general and across all instances I can think of so far, I do not agree with the part of your futurological forecast in which you reason, "After event W happens, everyone will see the truth of proposition X, leading them to endorse Y and agree with me about policy decision Z."

...

Example 2: "As AI gets more sophisticated, everyone will realize that real AI is on the way and then they'll start taking Friendly AI development seriously."

Alternative projection: As AI gets more sophisticated, the rest of society can't see any difference between the latest breakthrough reported in a press release and that business earlier with Watson beating Ken Jennings or Deep Blue beating Kasparov; it seems like the same sort of press release to them. The same people who were talking about robot overlords earlier continue to talk about robot overlords. The same people who were talking about human irreproducibility continue to talk about human specialness. Concern is expressed over technological unemployment the same as today or Keynes in 1930, and this is used to fuel someone's previous ideological commitment to a basic income guarantee, inequality reduction, or whatever. The same tiny segment of unusually consequentialist people are concerned about Friendly AI as before. If anyone in the science community does start thinking that superintelligent AI is on the way, they exhibit the same distribution of performance as modern scientists who think it's on the way, e.g. Hugo de Garis, Ben Goertzel, etc.

My own projection goes more like this:

As AI gets more sophisticated, and as more prestigious AI scientists begin to publicly acknowledge that AI is plausibly only 2-6 decades away, policy-makers and research funders will begin to respond to the AGI safety challenge, just like they began to respond to CFC damages in the late 70s, to global warming in the late 80s, and to synbio developments in the 2010s. As for society at large, I dunno. They'll think all kinds of random stuff for random reasons, and in some cases this will seriously impede effective policy, as it does in the USA for science education and immigration reform. Because AGI lends itself to arms races and is harder to handle adequately than global warming or nuclear security are, policy-makers and industry leaders will generally know AGI is coming but be unable to fund the needed efforts and coordinate effectively enough to ensure good outcomes.

At least one clear difference between my projection and Yudkowsky's is that I expect AI-expert performance on the problem to improve substantially as a greater fraction of elite AI scientists begin to think about the issue in Near mode rather than Far mode.

As a friend of mine suggested recently, current elite awareness of the AGI safety challenge is roughly where elite awareness of the global warming challenge was in the early 80s. Except, I expect elite acknowledgement of the AGI safety challenge to spread more slowly than it did for global warming or nuclear security, because AGI is tougher to forecast in general, and involves trickier philosophical nuances. (Nobody was ever tempted to say, "But as the nuclear chain reaction grows in power, it will necessarily become more moral!")

Still, there is a worryingly non-negligible chance that AGI explodes "out of nowhere." Sometimes important theorems are proved suddenly after decades of failed attempts by other mathematicians, and sometimes a computational procedure is sped up by 20 orders of magnitude with a single breakthrough.

Some alternatives to “Friendly AI”

lukeprog — 2014-06-15T19:53:20.340Z

Cross-posted from my blog.

What does MIRI's research program study?

The most established term for this was coined by MIRI founder Eliezer Yudkowsky: "Friendly AI." The term has some advantages, but it might suggest that MIRI is trying to build C-3PO, and it sounds a bit whimsical for a serious research program.

What about safe AGI or AGI safety? These terms are probably easier to interpret than Friendly AI. Also, people like being safe, and governments like saying they're funding initiatives to keep the public safe.

A friend of mine worries that these terms could provoke a defensive response (in AI researchers) of "Oh, so you think me and everybody else in AI is working on unsafe AI?" But I've never actually heard that response to "AGI safety" in the wild, and AI safety researchers regularly discuss "software system safety" and "AI safety" and "agent safety" and more specific topics like "safe reinforcement learning" without provoking negative reactions from people doing regular AI research.

I'm more worried that a term like "safe AGI" could provoke a response of "So you're trying to make sure that a system which is smarter than humans, and able to operate in arbitrary real-world environments, and able to invent new technologies to achieve its goals, will be safe? Let me save you some time and tell you right now that's impossible. Your research program is a pipe dream."

My reply goes something like "Yeah, it's way beyond our current capabilities, but lots of things that once looked impossible are now feasible because people worked really hard on them for a long time, and we don't think we can get the whole world to promise never to build AGI just because it's hard to make safe, so we're going to give AGI safety a solid try for a few decades and see what can be discovered." But that's probably not all that reassuring.

How about high-assurance AGI? In computer science, a "high assurance system" is one built from the ground up for unusually strong safety and/or security guarantees, because it's going to be used in safety-critical applications where human lives — or sometimes simply billions of dollars — are at stake (e.g. autopilot software or Mars rover software). So there's a nice analogy to MIRI's work, where we're trying to figure out what an AGI would look like if it was built from the ground up to get the strongest safety guarantees possible for such an autonomous and capable system.

I think the main problem with this term is that, quite reasonably, nobody will believe that we can ever get anywhere near as much assurance in the behavior of an AGI as we can in the behavior of, say, the relatively limited AI software that controls the European Train Control System. "High assurance AGI" sounds a bit like "Totally safe all-powerful demon lord." It sounds even more wildly unimaginable to AI researchers than "safe AGI."

What about superintelligence control or AGI control, as in Bostrom (2014)? "AGI control" is perhaps more believable than "high-assurance AGI" or "safe AGI," since it brings to mind AI containment methods, which sound more feasible to most people than designing an unconstrained AGI that is somehow nevertheless safe. (It's okay if they learn later that containment probably isn't an ultimate solution to the problem.)

On the other hand, it might provoke a reaction of "What, you don't think sentient robots have any rights, and you're free to control and confine them in any way you please? You're just repeating the immoral mistakes of the old slavemasters!" Which of course isn't true, but it takes some time to explain how I can think it's obvious that conscious machines have moral value while also being in favor of AGI control methods.

How about ethical AGI? First, I worry that it sounds too philosophical, and philosophy is widely perceived as a confused, unproductive discipline. Second, I worry that it sounds like the research assumes moral realism, which many (most?) intelligent people reject. Third, it makes it sound like most of the work is in selecting the goal function, which I don't think is true.

What about beneficial AGI? That's better than "ethical AGI," I think, but like "ethical AGI" and "Friendly AI," the term sounds less like a serious math and engineering discipline and more like some enclave of crank researchers writing a flurry of words (but no math) about how AGI needs to be "nice" and "trustworthy" and "not harmful" and oh yeah it must be "virtuous" too, whatever that means.

So yeah, I dunno. I think "AGI safety" is my least-disliked term these days, but I wish I knew of some better options.

An onion strategy for AGI discussion

lukeprog — 2014-05-31T19:08:24.784Z

Cross-posted from my blog.

"The stabilization of environments" is a paper about AIs that reshape their environments to make it easier to achieve their goals. This is typically called enforcement, but they prefer the term stabilization because it "sounds less hostile."

"I'll open the pod bay doors, Dave, but then I'm going to stabilize the ship..."

Sparrow (2013) takes the opposite approach to plain vs. dramatic language. Rather than using a modest term like iterated embryo selection, Sparrow prefers the phrase in vitro eugenics. Jeepers.

I suppose that's more likely to provoke public discussion, but... will much good will come of that public discussion? The public had a needless freak-out about in vitro fertilization back in the 60s and 70s and then, as soon as the first IVF baby was born in 1978, decided they were in favor of it.

Someone recently suggested I use an "onion strategy" for the discussion of novel technological risks. The outermost layer of the communication onion would be aimed at the general public, and focus on benefits rather than risks, so as not to provoke an unproductive panic. A second layer for a specialist audience could include a more detailed elaboration of the risks. The most complete discussion of risks and mitigation options would be reserved for technical publications that are read only by professionals.

Eric Drexler seems to wish he had more successfully used an onion strategy when writing about nanotechnology. Engines of Creation included frank discussions of both the benefits and risks of nanotechnology, including the "grey goo" scenario that was discussed widely in the media and used as the premise for the bestselling novel Prey.

Ray Kurzweil may be using an onion strategy, or at least keeping his writing in the outermost layer. If you look carefully, chapter 8 of The Singularity is Near takes technological risks pretty seriously, and yet it's written in such a way that most people who read the book seem to come away with an overwhelmingly optimistic perspective on technological change.

George Church may be following an onion strategy. Regenesis also contains a chapter on the risks of advanced bioengineering, but it's presented as an "epilogue" that many readers will skip.

Perhaps those of us writing about AGI for the general public should try to discuss:

astronomical stakes rather than existential risk
Friendly AI rather than AGI risk or the superintelligence control problem
the orthogonality thesis and convergent instrumental values and complexity of values rather than "doom by default"
etc.

MIRI doesn't have any official recommendations on the matter, but these days I find myself leaning toward an onion strategy.

Can noise have power?

lukeprog — 2014-05-23T04:54:32.829Z

One of the most interesting debates on Less Wrong that seems like it should be definitively resolvable is the one between Eliezer Yudkowsky, Scott Aaronson, and others on The Weighted Majority Algorithm. I'll reprint the debate here in case anyone wants to comment further on it.

In that post, Eliezer argues that "noise hath no power" (read the post for details). Scott disagreed. He replied:

...Randomness provably never helps in average-case complexity (i.e., where you fix the probability distribution over inputs) -- since given any ensemble of strategies, by convexity there must be at least one deterministic strategy in the ensemble that does at least as well as the average.

On the other hand, if you care about the worst-case running time, then there are settings (such as query complexity) where randomness provably does help. For example, suppose you're given n bits, you're promised that either n/3 or 2n/3 of the bits are 1's, and your task is to decide which. Any deterministic strategy to solve this problem clearly requires looking at 2n/3 + 1 of the bits. On the other hand, a randomized sampling strategy only has to look at O(1) bits to succeed with high probability.

Whether randomness ever helps in worst-case polynomial-time computation is the P versus BPP question, which is in the same league as P versus NP. It's conjectured that P=BPP (i.e., randomness never saves more than a polynomial). This is known to be true if really good pseudorandom generators exist, and such PRG's can be constructed if certain problems that seem to require exponentially large circuits, really do require them (see this paper by Impagliazzo and Wigderson). But we don't seem close to proving P=BPP unconditionally.

Eliezer replied:

Scott, I don't dispute what you say. I just suggest that the confusing term "in the worst case" be replaced by the more accurate phrase "supposing that the environment is an adversarial superintelligence who can perfectly read all of your mind except bits designated 'random'".

Scott replied:

I often tell people that theoretical computer science is basically mathematicized paranoia, and that this is the reason why Israelis so dominate the field. You're absolutely right: we do typically assume the environment is an adversarial superintelligence. But that's not because we literally think it is one, it's because we don't presume to know which distribution over inputs the environment is going to throw at us. (That is, we lack the self-confidence to impose any particular prior on the inputs.) We do often assume that, if we generate random bits ourselves, then the environment isn't going to magically take those bits into account when deciding which input to throw at us. (Indeed, if we like, we can easily generate the random bits after seeing the input -- not that it should make a difference.)

Average-case analysis is also well-established and used a great deal. But in those cases where you can solve a problem without having to assume a particular distribution over inputs, why complicate things unnecessarily by making such an assumption? Who needs the risk?

And later added:

...Note that I also enthusiastically belong to a "derandomize things" crowd! The difference is, I think derandomizing is hard work (sometimes possible and sometimes not), since I'm unwilling to treat the randomness of the problems the world throws at me on the same footing as randomness I generate myself in the course of solving those problems. (For those watching at home tonight, I hope the differences are now reasonably clear...)

Eliezer replied:

I certainly don't say "it's not hard work", and the environmental probability distribution should not look like the probability distribution you have over your random numbers - it should contain correlations and structure. But once you know what your probability distribution is, then you should do your work relative to that, rather than assuming "worst case". Optimizing for the worst case in environments that aren't actually adversarial, makes even less sense than assuming the environment is as random and unstructured as thermal noise.

I would defend the following sort of statement: While often it's not worth the computing power to take advantage of all the believed-in regularity of your probability distribution over the environment, any environment that you can't get away with treating as effectively random, probably has enough structure to be worth exploiting instead of randomizing.

(This isn't based on career experience, it's how I would state my expectation given my prior theory.)

Scott replied:

> "once you know what your probability distribution is..."

I'd merely stress that that's an enormous "once." When you're writing a program (which, yes, I used to do), normally you have only the foggiest idea of what a typical input is going to be, yet you want the program to work anyway. This is not just a hypothetical worry, or something limited to cryptography: people have actually run into strange problems using pseudorandom generators for Monte Carlo simulations and hashing (see here for example, or Knuth vol 2).

Even so, intuition suggests it should be possible to design PRG's that defeat anything the world is likely to throw at them. I share that intuition; it's the basis for the (yet-unproved) P=BPP conjecture.

"Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin." --von Neumann

And that's where the debate drops off, at least between Eliezer and Scott, at least on that thread.

Calling all MIRI supporters for unique May 6 giving opportunity!

lukeprog — 2014-05-04T23:45:25.469Z

(Cross-posted from MIRI's blog. MIRI maintains Less Wrong, with generous help from Trike Apps, and much of the core content is written by salaried MIRI staff members.)

Update: I'm liveblogging the fundraiser here.

Read our strategy below, then give here!

As previously announced, MIRI is participating in a massive 24-hour fundraiser on May 6th, called SV Gives. This is a unique opportunity for all MIRI supporters to increase the impact of their donations. To be successful we'll need to pre-commit to a strategy and see it through. If you plan to give at least $10 to MIRI sometime this year, during this event would be the best time to do it!

The plan

We need all hands on deck to help us win the following prize as many times as possible:

$2,000 prize for the nonprofit that has the most individual donors in an hour, every hour for 24 hours.

To paraphrase, every hour, there is a $2,000 prize for the organization that has the most individual donors during that hour. That's a total of $48,000 in prizes, from sources that wouldn't normally give to MIRI. The minimum donation is $10, and an individual donor can give as many times as they want. Therefore we ask our supporters to:

give $10 an hour, during every hour of the fundraiser that they are awake (I'll be up and donating for all 24 hours!);
for those whose giving budgets won't cover all those hours, see below for list of which hours you should privilege; and
publicize this effort as widely as possible.

International donors, we especially need your help!

MIRI has a strong community of international supporters, and this gives us a distinct advantage! While North America sleeps, you'll be awake, ready to target all of the overnight $2,000 hourly prizes.

Hours to target in order of importance

To increase our chances of winning these prizes we want to preferentially target the hours that will see the least donation traffic from donors of other participating organizations. Below are the top 12 hours we'd like to target in order of importance. Remember that all times are in Pacific Time. (Click on an hour to see what time it is in your timezone.)

1 am hour (01:00–01:59 PT)
2 am hour (02:00–02:59 PT)
3 am hour (03:00–03:59 PT)
4 am hour (04:00–04:59 PT)
5 am hour (05:00–05:59 PT)
6 am hour (06:00–06:59 PT)
11 pm hour (23:00–23:59 PT)
7 am hour (07:00–07:59 PT)
10 pm hour (22:00–22:59 PT)
8 am hour (08:00–08:59 PT)
5 pm hour (17:00–17:59 PT)
9 pm hour (21:00–21:59 PT)

For the 5 pm hour there is an additional prize I think we can win:

$1,000 golden ticket added to the first 50 organizations receiving gifts in the 5 pm hour.

So if you are giving in the 5 pm hour try and give right at the beginning of the hour.

Bottom line, for every hour you are awake, give $10 an hour.

Give preferentially to the hours above, if unable to give during all waking hours.

We also have plans to target the $300,000 in matching funds up for grabs during the event. If you would like to contribute $500 or more to this effort, shoot Malo an email at malo@intelligence.org.

For those who want to follow along and contribute to the last minute planning, as well as receive updates and giving reminders during the event, sign up here.

Is my view contrarian?

lukeprog — 2014-03-11T17:42:49.788Z

Previously: Contrarian Excuses, The Correct Contrarian Cluster, What is bunk?, Common Sense as a Prior, Trusting Expert Consensus, Prefer Contrarian Questions.

Robin Hanson once wrote:

On average, contrarian views are less accurate than standard views. Honest contrarians should admit this, that neutral outsiders should assign most contrarian views a lower probability than standard views, though perhaps a high enough probability to warrant further investigation. Honest contrarians who expect reasonable outsiders to give their contrarian view more than normal credence should point to strong outside indicators that correlate enough with contrarians tending more to be right.

I tend to think through the issue in three stages:

When should I consider myself to be holding a contrarian^[1] view? What is the relevant expert community?
If I seem to hold a contrarian view, when do I have enough reason to think I’m correct?
If I seem to hold a correct contrarian view, what can I do to give other people good reasons to accept my view, or at least to take it seriously enough to examine it at length?

I don’t yet feel that I have “answers” to these questions, but in this post (and hopefully some future posts) I’d like to organize some of what has been said before,^[2] and push things a bit further along, in the hope that further discussion and inquiry will contribute toward significant progress in social epistemology.^[3] Basically, I hope to say a bunch of obvious things, in a relatively well-organized fashion, so that less obvious things can be said from there.^[4]

In this post, I’ll just address stage 1. Hopefully I’ll have time to revisit stages 2 and 3 in future posts.

Is my view contrarian?

World model differences vs. value differences

Is my effective altruism a contrarian view? It seems to be more of a contrarian value judgment than a contrarian world model,^[5] and by “contrarian view” I tend to mean “contrarian world model.” Some apparently contrarian views are probably actually contrarian values.

Expert consensus

Is my atheism a contrarian view? It’s definitely a world model, not a value judgment, and only 2% of people are atheists.

But what’s the relevant expert population, here? Suppose it’s “academics who specialize in the arguments and evidence concerning whether a god or gods exist.” If so, then the expert population is probably dominated by academic theologians and religious philosophers, and my atheism is a contrarian view.

We need some heuristics for evaluating the soundness of the academic consensus in different fields. ^[6]

For example, we should consider the selection effects operating on communities of experts. If someone doesn’t believe in God, they’re unlikely to spend their career studying arcane arguments for and against God’s existence. So most people who specialize in this topic are theists, but nearly all of them were theists before they knew the arguments.

Perhaps instead the relevant expert community is “scholars who study the fundamental nature of the universe” — maybe, philosophers and physicists? They’re mostly atheists. ^[7] This is starting to get pretty ad-hoc, but maybe that’s unavoidable.

What about my view that the overall long-term impact of AGI will be, most likely, extremely bad? A recent survey of the top 100 authors in artificial intelligence (by citation index)^[8] suggests that my view is somewhat out of sync with the views of those researchers.^[9] But is that the relevant expert population? My impression is that AI experts know a lot about contemporary AI methods, especially within their subfield, but usually haven’t thought much about, or read much about, long-term AI impacts.

Instead, perhaps I’d need to survey “AGI impact experts” to tell whether my view is contrarian. But who is that, exactly? There’s no standard credential.

Moreover, the most plausible candidates around today for “AGI impact experts” are — like the “experts” of many other fields — mere “scholastic experts,” in that they^[10] know a lot about the arguments and evidence typically brought to bear on questions of long-term AI outcomes.^[11] They generally are not experts in the sense of “Reliably superior performance on representative tasks” — they don’t have uniquely good track records on predicting long-term AI outcomes, for example. As far as I know, they don’t even have uniquely good track records on predicting short-term geopolitical or sci-tech outcomes — e.g. they aren’t among the “super forecasters” discovered in IARPA’s forecasting tournaments.

Furthermore, we might start to worry about selection effects, again. E.g. if we ask AGI experts when they think AGI will be built, they may be overly optimistic about the timeline: after all, if they didn’t think AGI was feasible soon, they probably wouldn’t be focusing their careers on it.

Perhaps we can salvage this approach for determining whether one has a contrarian view, but for now, let’s consider another proposal.

Mildly extrapolated elite opinion

Nick Beckstead instead suggests that, at least as a strong prior, one should believe what one thinks “a broad coalition of trustworthy people would believe if they were trying to have accurate views and they had access to [one’s own] evidence.”^[12] Below, I’ll propose a modification of Beckstead’s approach which aims to address the “Is my view contrarian?” question, and I’ll call it the “mildly extrapolated elite opinion” (MEEO) method for determining the relevant expert population. ^[13]

First: which people are “trustworthy”? With Beckstead, I favor “giving more weight to the opinions of people who can be shown to be trustworthy by clear indicators that many people would accept, rather than people that seem trustworthy to you personally.” (This guideline aims to avoid parochialism and self-serving cognitive biases.)

What are some “clear indicators that many people would accept”? Beckstead suggests:

IQ, business success, academic success, generally respected scientific or other intellectual achievements, wide acceptance as an intellectual authority by certain groups of people, or success in any area where there is intense competition and success is a function of ability to make accurate predictions and good decisions…

Of course, trustworthiness can also be domain-specific. Very often, elite common sense would recommend deferring to the opinions of experts (e.g., listening to what physicists say about physics, what biologists say about biology, and what doctors say about medicine). In other cases, elite common sense may give partial weight to what putative experts say without accepting it all (e.g. economics and psychology). In other cases, they may give less weight to what putative experts say (e.g. sociology and philosophy).

Hence MEEO outsources the challenge of evaluating academic consensus in different fields to the “generally trustworthy people.” But in doing so, it raises several new challenges. How do we determine which people are trustworthy? How do we “mildly extrapolate” their opinions? How do we weight those mildly extrapolated opinions in combination?

This approach might also be promising, or it might be even harder to use than the “expert consensus” method.

My approach

In practice, I tend to do something like this:

To determine whether my view is contrarian, I ask whether there’s a fairly obvious, relatively trustworthy expert population on the issue. If there is, I try to figure out what their consensus on the matter is. If it’s different than my view, I conclude I have a contrarian view.
If there isn’t an obvious trustworthy expert population on the issue from which to extract a consensus view, then I basically give up on step 1 (“Is my view contrarian?”) and just move to the model combination in step 2 (see below), retaining pretty large uncertainty about how contrarian my view might be.

When do I have good reason to think I’m correct?

Suppose I conclude I have a contrarian view, as I plausibly have about long-term AGI outcomes,^[14] and as I might have about the technological feasibility of preserving myself via cryonics.^[15] How much evidence do I need to conclude that my view is justified despite the informed disagreement of others?

I’ll try to tackle that question in a future post. Not surprisingly, my approach is a kind of model combination and adjustment.

I don’t have a concise definition for what counts as a “contrarian view.” In any case, I don’t think that searching for an exact definition of “contrarian view” is what matters. In an email conversation with me, Holden Karnofsky concurred, making the point this way: “I agree with you that the idea of ‘contrarianism’ is tricky to define. I think things get a bit easier when you start looking for patterns that should worry you rather than trying to Platonically define contrarianism… I find ‘Most smart people think I’m bonkers about X’ and ‘Most people who have studied X more than I have plus seem to generally think like I do think I’m wrong about X’ both worrying; I find ‘Most smart people think I’m wrong about X’ and ‘Most people who spend their lives studying X within a system that seems to be clearly dysfunctional and to have a bad track record think I’m bonkers about X’ to be less worrying.” ↩

For a diverse set of perspectives on the social epistemology of disagreement and contrarianism not influenced (as far as I know) by the Overcoming Bias and Less Wrong conversations about the topic, see Christensen (2009); Ericsson et al. (2006); Kuchar (forthcoming); Miller (2013); Gelman (2009); Martin & Richards (1995); Schwed & Bearman (2010); Intemann & de Melo-Martin (2013). Also see Wikipedia’s article on scientific consensus. ↩

I suppose I should mention that my entire inquiry here is, ala Goldman (1998), premised on the assumptions that (1) the point of epistemology is the pursuit of correspondence-theory truth, and (2) the point of social epistemology is to evaluate which social institutions and practices have instrumental value for producing true or well-calibrated beliefs. ↩

I borrow this line from Chalmers (2014): “For much of the paper I am largely saying the obvious, but sometimes the obvious is worth saying so that less obvious things can be said from there.” ↩

Holden Karnofsky seems to agree: “I think effective altruism falls somewhere on the spectrum between ‘contrarian view’ and ‘unusual taste.’ My commitment to effective altruism is probably better characterized as ‘wanting/choosing to be an effective altruist’ than as ‘believing that effective altruism is correct.’” ↩

Without such heuristics, we can also rather quickly arrive at contradictions. For example, the majority of scholars who specialize in Allah’s existence believe that Allah is the One True God, and the majority of scholars who specialize in Yahweh’s existence believe that Yahweh is the One True God. Consistency isn’t everything, but contradictions like this should still be a warning sign. ↩

According to the PhilPapers Surveys, 72.8% of philosophers are atheists, 14.6% are theists, and 12.6% categorized themselves as “other.” If we look only at metaphysicians, atheism remains dominant at 73.7%. If we look only at analytic philosophers, we again see atheism at 76.3%. As for physicists: Larson & Witham (1997) found that 77.9% of physicists and astronomers are disbelievers, and Pew Research Center (2009) found that 71% of physicists and astronomers did not believe in a god. ↩

Muller & Bostrom (forthcoming). “Future Progress in Artificial Intelligence: A Poll Among Experts.” ↩

But, this is unclear. First, I haven’t read the forthcoming paper, so I don’t yet have the full results of the survey, along with all its important caveats. Second, distributions of expert opinion can vary widely between polls. For example, Schlosshauer et al. (2013) reports the results of a poll given to participants in a 2011 quantum foundations conference (mostly physicists). When asked “When will we have a working and useful quantum computer?”, 9% said “within 10 years,” 42% said “10–25 years,” 30% said “25–50 years,” 0% said “50–100 years,” and 15% said “never.” But when the exact same questions were asked of participants at another quantum foundations conference just two years later, Norsen & Nelson (2013) report, the distribution of opinion was substantially different: 9% said “within 10 years,” 22% said “10–25 years,” 20% said “25–50 years,” 21% said “50–100 years,” and 12% said “never.” ↩

I say “they” in this paragraph, but I consider myself to be a plausible candidate for an “AGI impact expert,” in that I’m unusually familiar with the arguments and evidence typically brought to bear on questions of long-term AI outcomes. I also don’t have a uniquely good track record on predicting long-term AI outcomes, nor am I among the discovered “super forecasters.” I haven’t participated in IARPA’s forecasting tournaments myself because it would just be too time consuming. I would, however, very much like to see these super forecasters grouped into teams and tasked with forecasting longer-term outcomes, so that we can begin to gather scientific data on which psychological and computational methods result in the best predictive outcomes when considering long-term questions. Given how long it takes to acquire these data, we should start as soon as possible. ↩

Weiss & Shanteau (2012) would call them “privileged experts.” ↩

Beckstead’s “elite common sense” prior and my “mildly extrapolated elite opinion” method are epistemic notions that involve some kind idealization or extrapolation of opinion. One earlier such proposal in social epistemology was Habermas’ “ideal speech situation,” a situation of unlimited discussion between free and equal humans. See Habermas’ “Wahrheitstheorien” in Schulz & Fahrenbach (1973) or, for an English description, Geuss (1981), pp. 65–66. See also the discussion in Tucker (2003), pp. 502–504. ↩

Beckstead calls his method the “elite common sense” prior. I’ve named my method differently for two reasons. First, I want to distinguish MEEO from Beckstead’s prior, since I’m using the method for a slightly different purpose. Second, I think “elite common sense” is a confusing term even for Beckstead’s prior, since there’s some extrapolation of views going on. But also, it’s only a “mild” extrapolation — e.g. we aren’t asking what elites would think if they knew everything, or if they could rewrite their cognitive software for better reasoning accuracy. ↩

My rough impression is that among the people who seem to have thought long and hard about AGI outcomes, and seem to me to exhibit fairly good epistemic practices on most issues, my view on AGI outcomes is still an outlier in its pessimism about the likelihood of desirable outcomes. But it’s hard to tell: there haven’t been systematic surveys of the important-to-me experts on the issue. I also wonder whether my views about long-term AGI outcomes are more a matter of seriously tackling a contrarian question rather than being a matter of having a particularly contrarian view. On this latter point, see this Facebook discussion. ↩

I haven’t seen a poll of cryobiologists on the likely future technological feasibility of cryonics. Even if there were such polls, I’d wonder whether cryobiologists also had the relevant philosophical and neuroscientific expertise. I should mention that I’m not personally signed up for cryonics, for these reasons. ↩

Futurism's Track Record

lukeprog — 2014-01-29T20:27:24.738Z

It would be nice (and expensive) to get a systematic survey on this, but my impressions [1] after tracking down lots of past technology predictions, and reading histories of technological speculation and invention, and reading about “elite common sense” at various times in the past, are that:

Elite common sense at a given time almost always massively underestimates what will be technologically feasible in the future.
“Futurists” in history tend to be far more accurate about what will be technologically feasible (when they don’t grossly violate known physics), but they are often too optimistic about timelines, and (like everyone else) show little ability to predict (1) the long-term social consequences of future technologies, or (2) the details of which (technologically feasible; successfully prototyped) things will make commercial sense, or be popular products.

Naturally, as someone who thinks it’s incredibly important to predict the long-term future as well as we can while also avoiding overconfidence, I try to put myself in a position to learn what past futurists were doing right, and what they were doing wrong. For example, I recommend: Be a fox not a hedgehog. Do calibration training. Know how your brain works. Build quantitative models even if you don’t believe the outputs, so that specific pieces of the model are easier to attack and update. Have broad confidence intervals over the timing of innovations. Remember to forecast future developments by looking at trends in many inputs to innovation, not just the “calendar years” input. Use model combination. Study history and learn from it. Etc.

Anyway: do others who have studied the history of futurism, elite common sense, innovation, etc. have different impressions about futurism’s track record? And, anybody want to do a PhD thesis examining futurism’s track record? Or on some piece of it, ala this or this or this? :)

I should explain one additional piece of reasoning which contributes to my impressions on the matter. How do I think about futurist predictions of technologies that haven’t yet been definitely demonstrated to be technologically feasible or infeasible? For these, I try to use something like the truth-tracking fields proxy. E.g. very few intellectual elites (outside Turing, von Neumann, Good, etc.) in 1955 thought AGI would be technologically feasible. By 1980, we’d made a bunch of progress in computing and AI and neuroscience, and a much greater proportion of intellectual elites came to think AGI would be technologically feasible. Today, I think the proportion is even greater. The issue hasn’t been “definitely decided” yet (from a social point of view), but things are strongly trending in favor of Good and Turing, and against (e.g.) Dreyfus. ↩

Tricky Bets and Truth-Tracking Fields

lukeprog — 2014-01-29T08:52:38.889Z

While visiting Oxford for MIRI’s November 2013 workshop, I had the pleasure of visiting a meeting of “Skeptics in the Pub” in the delightfully British-sounding town of High Wycombe in Buckinghamshire. (Say that aloud in a British accent and try not to grin; I dare you!)

I presented a mildly drunk intro to applied rationality, followed by a 2-hour Q&A that, naturally, wandered into the subject of why AI will inevitably eat the Earth. I must have been fairly compelling despite the beer, because at one point I noticed the bartenders were leaning uncomfortably over one end of the bar in order to hear me, ignoring thirsty customers at the other end.

Anyhoo, at one point I was talking about the role of formal knowledge in applied rationality, so I explained Solomonoff’s lightsaber and why it made me think the wave function never collapses.

Someone — I can’t recall who; let’s say “Bob” — wisely asked, “But if quantum interpretations all predict the same observations, what does it mean for you to say the wave function never collapses? What do you anticipate?” [1]

Now, I don’t actually know whether the usual proposals for experimental tests of collapse make sense, so instead I answered:

Well, I think theoretical physics is truth-tracking enough that it eventually converges toward true theories, so one thing I anticipate as a result of favoring a no-collapse view is that a significantly greater fraction of physicists will reject collapse in 20 years, compared to today.

Had Bob and I wanted to bet on whether the wave function collapses or not, that would have been an awfully tricky bet to settle. But if we roughly agree on the truth-trackingness of physics as a field, then we can use the consensus of physicists a decade or two from now as a proxy for physical truth, and bet on that instead.

This won’t work for some fields. For example, philosophy sometimes looks more like a random walk than a truth-tracking inquiry — or, more charitably, it tracks truth on the scale of centuries rather than decades. For example, did you know that one year after the cover of TIME asked “Is God dead?”, a philosopher named Alvin Plantinga launched a renaissance in Christian philosophy, such that theism and Christian particularism were more commonly defended by analytic philosophers in the 1970s than they were in the 1930s? I also have the impression that moral realism was a more popular view in the 1990s than it was in the 1970s, and that physicalism is less common today than it was in the 1960s, but I’m less sure about those.

You can also do this for bets that are hard to settle for a different kind of reason, e.g. an apocalypse bet. [2] Suppose Bob and I want to bet on whether smarter-than-human AI is technologically feasible. Trouble is, if it’s ever proven that superhuman AI is feasible, that event might overthrow the global economy, making it hard to collect the bet, or at least pointless.

But suppose Bob and I agree that AI scientists, or computer scientists, or technology advisors to first-world governments, or some other set of experts, is likely to converge toward the true answer on the feasibility of superhuman AI as time passes, as humanity learns more, etc. Then we can instead make a bet on whether it will be the case, 20 years from now, that a significantly increased or decreased fraction of those experts will think superhuman AI is feasible.

Often, there won’t be acceptable polls of the experts at both times, for settling the bet. But domain experts typically have a general sense of whether some view has become more or less common in their field over time. So Bob and I could agree to poll a randomly chosen subset of our chosen expert community 20 years from now, asking them how common the view in question is at that time and how common it was 20 years earlier, and settle our bet that way.

Getting the details right for this sort of long-term bet isn’t trivial, but I don't see a fatal flaw. Is there a fatal flaw in the idea that I’ve missed? [3]

I can’t recall exactly how the conversation went, but it was something like this. ↩
See also Jones, How to bet on bad futures. ↩
I also doubt I’m the first person to describe this idea in writing: please link to other articles making this point if you know of any. ↩

MIRI's Winter 2013 Matching Challenge

lukeprog — 2013-12-17T20:41:28.303Z

Update: The fundraiser has been completed! Details here. The original post follows...

(Cross-posted from MIRI's blog. MIRI maintains Less Wrong, with generous help from Trike Apps, and much of the core content is written by salaried MIRI staff members.)

Thanks to Peter Thiel, every donation made to MIRI between now and January 15th, 2014 will be matched dollar-for-dollar!

Also, gifts from "new large donors" will be matched 3x! That is, if you've given less than $5k to SIAI/MIRI ever, and you now give or pledge $5k or more, Thiel will donate $3 for every dollar you give or pledge.

We don't know whether we'll be able to offer the 3:1 matching ever again, so if you're capable of giving $5k or more, we encourage you to take advantage of the opportunity while you can. Remember that:

If you prefer to give monthly, no problem! If you pledge 6 months of monthly donations, your full 6-month pledge will be the donation amount to be matched. So if you give monthly, you can get 3:1 matching for only $834/mo (or $417/mo if you get matching from your employer).
We accept Bitcoin (BTC) and Ripple (XRP), both of which have recently jumped in value. If the market value of your Bitcoin or Ripple is $5k or more on the day you make the donation, this will count for matching.
If your employer matches your donations at 1:1 (check here), then you can take advantage of Thiel's 3:1 matching by giving as little as $2,500 (because it's $5k after corporate matching).

Please email malo@intelligence.org if you intend on leveraging corporate matching or would like to pledge 6 months of monthly donations, so that we can properly account for your contributions towards the fundraiser.

Thiel's total match is capped at $250,000. The total amount raised will depend on how many people take advantage of 3:1 matching. We don't anticipate being able to hit the $250k cap without substantial use of 3:1 matching — so if you haven't given $5k thus far, please consider giving/pledging $5k or more during this drive. (If you'd like to know the total amount of your past donations to MIRI, just ask malo@intelligence.org.)

Now is your chance to double or quadruple your impact in funding our research program.

Donate Today

Accomplishments Since Our July 2013 Fundraiser Launched:

Held three research workshops, including our first European workshop.
Talks at MIT and Harvard, by Eliezer Yudkowsky and Paul Christiano.
Yudkowsky is blogging more Open Problems in Friendly AI... on Facebook! (They're also being written up in a more conventional format.)
New papers: (1) Algorithmic Progress in Six Domains; (2) Embryo Selection for Cognitive Enhancement; (3) Racing to the Precipice; (4) Predicting AGI: What can we say when we know so little?
New ebook: The Hanson-Yudkowsky AI-Foom Debate.
New analyses: (1) From Philosophy to Math to Engineering; (2) How well will policy-makers handle AGI? (3) How effectively can we plan for future decades? (4) Transparency in Safety-Critical Systems; (5) Mathematical Proofs Improve But Don’t Guarantee Security, Safety, and Friendliness; (6) What is AGI? (7) AI Risk and the Security Mindset; (8) Richard Posner on AI Dangers; (9) Russell and Norvig on Friendly AI.
New expert interviews: Greg Morrisett (Harvard), Robin Hanson (GMU), Paul Rosenbloom (USC), Stephen Hsu (MSU), Markus Schmidt (Biofaction), Laurent Orseau (AgroParisTech), Holden Karnofsky (GiveWell), Bas Steunebrink (IDSIA), Hadi Esmaeilzadeh (GIT), Nick Beckstead (Oxford), Benja Fallenstein (Bristol), Roman Yampolskiy (U Louisville), Ben Goertzel (Novamente), and James Miller (Smith College).
With Leverage Research, we held a San Francisco book launch party for James Barratt's Our Final Invention, which discusses MIRI's work at length. (If you live in the Bay Area and would like to be notified of local events, please tell malo@intelligence.org!)

How Will Marginal Funds Be Used?

Hiring Friendly AI researchers, identified through our workshops, as they become available for full-time work at MIRI.
Running more workshops (next one begins Dec. 14th), to make concrete Friendly AI research progress, to introduce new researchers to open problems in Friendly AI, and to identify candidates for MIRI to hire.
Describing more open problems in Friendly AI. Our current strategy is for Yudkowsky to explain them as quickly as possible via Facebook discussion, followed by more structured explanations written by others in collaboration with Yudkowsky.
Improving humanity's strategic understanding of what to do about superintelligence. In the coming months this will include (1) additional interviews and analyses on our blog, (2) a reader's guide for Nick Bostrom's forthcoming Superintelligence book, and (3) an introductory ebook currently titled Smarter Than Us.

Other projects are still being surveyed for likely cost and impact.

We appreciate your support for our work! Donate now, and seize a better than usual chance to move our work forward. If you have questions about donating, please contact Louie Helm at (510) 717-1477 or louie@intelligence.org. Screenshot Service provided by LinkPeek.com.

A model of AI development

lukeprog — 2013-11-28T13:48:20.083Z

FHI has released a new tech report:

Armstrong, Bostrom, and Shulman. Racing to the Precipice: a Model of Artificial Intelligence Development.

Abstract:

This paper presents a simple model of an AI arms race, where several development teams race to build the first AI. Under the assumption that the first AI will be very powerful and transformative, each team is incentivized to finish first — by skimping on safety precautions if need be. This paper presents the Nash equilibrium of this process, where each team takes the correct amount of safety precautions in the arms race. Having extra development teams and extra enmity between teams can increase the danger of an AI-disaster, especially if risk taking is more important than skill in developing the AI. Surprisingly, information also increases the risks: the more teams know about each others’ capabilities (and about their own), the more the danger increases.

The paper is short and readable; discuss it here!

But my main reason for posting is to ask this question: What is the most similar work that you know of? I'd expect people to do this kind of thing for modeling nuclear security risks, and maybe other things, but I don't happen to know of other analyses like this.

Gelman Against Parsimony

lukeprog — 2013-11-24T15:23:32.773Z

In two posts, Bayesian stats guru Andrew Gelman argues against parsimony, though it seems to be favored 'round these parts, in particular Solomonoff Induction and BIC as imperfect formalizations of Occam's Razor.

Gelman says:

I’ve never seen any good general justification for parsimony...

Maybe it’s because I work in social science, but my feeling is: if you can approximate reality with just a few parameters, fine. If you can use more parameters to fold in more information, that’s even better.

In practice, I often use simple models–because they are less effort to fit and, especially, to understand. But I don’t kid myself that they’re better than more complicated efforts!

My favorite quote on this comes from Radford Neal‘s book, Bayesian Learning for Neural Networks, pp. 103-104: "Sometimes a simple model will outperform a more complex model . . . Nevertheless, I believe that deliberately limiting the complexity of the model is not fruitful when the problem is evidently complex. Instead, if a simple model is found that outperforms some particular complex model, the appropriate response is to define a different complex model that captures whatever aspect of the problem led to the simple model performing well."

...

...ideas like minimum-description-length, parsimony, and Akaike’s information criterion, are particularly relevant when models are estimated using least squares, maximum likelihood, or some other similar optimization method.

When using hierarchical models, we can avoid overfitting and get good descriptions without using parsimony–the idea is that the many parameters of the model are themselves modeled. See here for some discussion of Radford Neal’s ideas in favor of complex models, and see here for an example from my own applied research.

From Philosophy to Math to Engineering

lukeprog — 2013-11-04T15:43:55.704Z

Cross-posted from the MIRI blog.

For centuries, philosophers wondered how we could learn what causes what. Some argued it was impossible, or possible only via experiment. Others kept hacking away at the problem, clarifying ideas like counterfactual and probability and correlation by making them more precise and coherent.

Then, in the 1990s, a breakthrough: Judea Pearl and others showed that, in principle, we can sometimes infer causal relations from data even without experiment, via the mathematical machinery of probabilistic graphical models.

Next, engineers used this mathematical insight to write software that can, in seconds, infer causal relations from a data set of observations.

Across the centuries, researchers had toiled away, pushing our understanding of causality from philosophy to math to engineering.

And so it is with Friendly AI research. Current progress on each sub-problem of Friendly AI lies somewhere on a spectrum from philosophy to math to engineering.

We began with some fuzzy philosophical ideas of what we want from a Friendly AI (FAI). We want it to be benevolent and powerful enough to eliminate suffering, protect us from natural catastrophes, help us explore the universe, and otherwise make life awesome. We want FAI to allow for moral progress, rather than immediately reshape the galaxy according to whatever our current values happen to be. We want FAI to remain beneficent even as it rewrites its core algorithms to become smarter and smarter. And so on.

Small pieces of this philosophical puzzle have been broken off and turned into math, e.g. Pearlian causal analysis and Solomonoff induction. Pearl's math has since been used to produce causal inference software that can be run on today's computers, whereas engineers have thus far succeeded in implementing (tractable approximations of) Solomonoff induction only for very limited applications.

Toy versions of two pieces of the "stable self-modification" problem were transformed into math problems in de Blanc (2011) and Yudkowsky & Herreshoff (2013), though this was done to enable further insight via formal analysis, not to assert that these small pieces of the philosophical problem had been solved to the level of math.

Thanks to Patrick LaVictoire and other MIRI workshop participants,¹ Douglas Hofstadter's FAI-relevant philosophical idea of "superrationality" seems to have been, for the most part, successfully transformed into math, and a bit of the engineering work has also been done.

I say "seems" because, while humans are fairly skilled at turning math into feats of practical engineering, we seem to be much less skilled at turning philosophy into math, without leaving anything out. For example, some very sophisticated thinkers have claimed that "Solomonoff induction solves the problem of inductive inference," or that "Solomonoff has successfully invented a perfect theory of induction." And indeed, it certainly seems like a truly universal induction procedure. However, it turns out that Solomonoff induction doesn't fully solve the problem of inductive inference, for relatively subtle reasons.²

Unfortunately, philosophical mistakes like this could be fatal when humanity builds the first self-improving AGI (Yudkowsky 2008).³ FAI-relevant philosophical work is, as Nick Bostrom says, "philosophy with a deadline."

¹ And before them, Moshe Tennenholtz.

² Yudkowsky plans to write more about how to improve on Solomonoff induction, later.

³ This is a specific instance of a problem Peter Ludlow described like this: "the technological curve is pulling away from the philosophy curve very rapidly and is about to leave it completely behind."

The Inefficiency of Theoretical Discovery

lukeprog — 2013-11-03T21:26:52.468Z

Previously: Why Neglect Big Topics.

Why was there no serious philosophical discussion of normative uncertainty until 1989, given that all the necessary ideas and tools were present at the time of Jeremy Bentham?

Why did no professional philosopher analyze I.J. Good’s important “intelligence explosion” thesis (from 1959¹) until 2010?

Why was reflectively consistent probabilistic metamathematics not described until 2013, given that the ideas it builds on go back at least to the 1940s?

Why did it take until 2003 for professional philosophers to begin updating causal decision theory for the age of causal Bayes nets, and until 2013 to formulate a reliabilist metatheory of rationality?

By analogy to financial market efficiency, I like to say that “theoretical discovery is fairly inefficient.” That is: there are often large, unnecessary delays in theoretical discovery.

This shouldn’t surprise us. For one thing, there aren’t necessarily large personal rewards for making theoretical progress. But it does mean that those who do care about certain kinds of theoretical progress shouldn’t necessarily think that progress will be hard. There is often low-hanging fruit to be plucked by investigators who know where to look.

Where should we look for low-hanging fruit? I’d guess that theoretical progress may be relatively easy where:

Progress has no obvious, immediately profitable applications.
Relatively few quality-adjusted researcher hours have been devoted to the problem.
New tools or theoretical advances open up promising new angles of attack.
Progress is only valuable to those with unusual views.

These guesses make sense of the abundant low-hanging fruit in much of MIRI’s theoretical research, with the glaring exception of decision theory. Our September decision theory workshop revealed plenty of low-hanging fruit, but why should that be? Decision theory is widely applied in multi-agent systems, and in philosophy it’s clear that visible progress in decision theory is one way to “make a name” for oneself and advance one’s career. Tons of quality-adjusted researcher hours have been devoted to the problem. Yes, new theoretical advances (e.g. causal Bayes nets and program equilibrium) open up promising new angles of attack, but they don’t seem necessary to much of the low-hanging fruit discovered thus far. And progress in decision theory is definitely not valuable only to those with unusual views. What gives?

Anyway, three questions:

Do you agree about the relative inefficiency of theoretical discovery?
What are some other signs of likely low-hanging fruit for theoretical progress?
What’s up with decision theory having so much low-hanging fruit?

¹ Good (1959) is the earliest statement of the intelligence explosion: “Once a machine is designed that is good enough… it can be put to work designing an even better machine. At this point an ”explosion“ will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.” The term itself, “intelligence explosion,” originates with Good (1965). Technically, artist and philosopher Stefan Themerson wrote a "philosophical analysis" of Good's intelligence explosion thesis called Special Branch, published in 1972, but by "philosophical analysis" I have in mind a more analytic, argumentative kind of philosophical analysis than is found in Themerson's literary Special Branch. ↩

Intelligence Amplification and Friendly AI

lukeprog — 2013-09-27T01:09:15.978Z

Part of the series AI Risk and Opportunity: A Strategic Analaysis. Previous articles on this topic: Some Thoughts on Singularity Strategies, Intelligence enhancement as existential risk mitigation, Outline of possible Singularity scenarios that are not completely disastrous.

Below are my quickly-sketched thoughts on intelligence amplification and FAI, without much effort put into organization or clarity, and without many references.[1] But first, I briefly review some strategies for increasing the odds of FAI, one of which is to work on intelligence amplification (IA).

Some possible “best current options” for increasing the odds of FAI

Suppose you find yourself in a pre-AGI world,[2] and you’ve been convinced that the status quo world is unstable, and within the next couple centuries we’ll likely[3] settle into one of four stable outcomes: FAI, uFAI, non-AI extinction, or a sufficiently powerful global government which can prevent AGI development[4]. And you totally prefer the FAI option. What should you do to get there?

Obvious direct approach: start solving the technical problems that must be solved to get FAI: goal stability under self-modification, decision algorithms that handle counterfactuals and logical uncertainty properly, indirect normativity, and so on. (MIRI’s work, some FHI work.)
Do strategy research, to potentially identify superior alternatives to the other items on this list, or superior versions of the things on this list already. (FHI’s work, some MIRI work, etc.)
Accelerate IA technologies, so that smarter humans can tackle FAI. (E.g. cognitive genomics.)
Try to make sure we get high-fidelity WBEs before AGI, without WBE work first enabling dangerous neuromorphic AGI. (Dalyrmple’s work?)
Improve political and scientific institutions so that the world is more likely to handle AGI wisely when it comes. (Prediction markets? Vannevar Group?)
Capacity-building. Grow the rationality community, the x-risk reduction community, the effective altruism movement, etc.
Other stuff. (More in later posts).

The IA route

Below are some key considerations about the IA route. I’ve numbered them so they’re easy to refer to later. My discussion assumes MIRI’s basic assumptions, including timelines similar to my own AGI timelines.

Maybe FAI is so hard that we can only get FAI with a large team of IQ 200+ humans, whereas uFAI can be built by a field of IQ 130–170 humans with a few more decades and lots of computing power and trial and error. So to have any chance of FAI at all, we’ve got to do WBE or IA first.
You could accelerate FAI relative to AGI if you somehow kept IA technology secret, for use only by FAI researchers (and maybe their supporters).
Powerful IA technologies would likely get wide adoption, and accelerate economic growth and scientific progress in general. If you think Earths with slower economic growth have a better chance at FAI, that could be bad for our FAI chances. If you think the opposite, then broad acceleration from IA could be good for FAI.
Maybe IA increases one’s “rationality” and “philosophical ability” (in scare quotes because we mostly don’t know how to measure them yet), and thus IA increases the frequency with which people will realize the risks of AGI and do sane things about it.
Maybe IA increases the role of intelligence and designer understanding, relative to hardware and accumulated knowledge, in AI development.[5]

Below are my thoughts about all this. These are only my current views: other MIRI personnel (including Eliezer) disagree with some of the points below, and I wouldn’t be surprised to change my mind about some of these things after extended discussion (hopefully in public, on Less Wrong).

I doubt (1) is true. I think IQ 130–170 humans could figure out FAI in 50–150 years if they were trying to solve the right problems, and if FAI development wasn’t in a death race with the strictly easier problem of uFAI. If normal smart humans aren’t capable of building FAI in that timeframe, that’s probably for lack of rationality and philosophical skill, not for lack of IQ. And I’m not confident that rationality and philosophical skill predictably improve with IQ after about IQ 140. It’s a good sign that atheism increases with IQ after IQ 140, but on the other hand I know too many high-IQ people who think that (e.g.) an AI that maximizes K-complexity is a win, and also there’s Stanovich’s research on how IQ and rationality come apart. For these reasons, I’m also not convinced (4) would be a large positive effect on our FAI chances.

Can we train people in rationality and philosophical skill beyond that of say, the 95th percentile Less Wronger? CFAR has plans to find out, but they need to grow a lot first to execute such an ambitious research program.

(2) looks awfully hard, unless we can find a powerful IA technique that also, say, gives you a 10% chance of cancer. Then some EAs devoted to building FAI might just use the technique, and maybe the AI community in general doesn’t.

(5) seems right, though I doubt it’ll be a big enough effect to make a difference for the final outcome.

I think (3) is the dominant consideration here, along with the worry about lacking the philosophical skill (but not IQ) to build FAI at all. At the moment, I (sadly) lean toward the view that slower Earths have a better chance at FAI. (Much of my brain doesn’t know this, though: I remember reading the Summers news with glee, and then remembering that on my current model this was actually bad news for FAI.)

I could say more, but I’ll stop for now and see what comes up in discussion.

My thanks to Justin Shovelain for sending me his old notes on the “IA first” case, and to Wei Dai, Carl Shulman, and Eliezer Yudkowsky for their feedback on this post. ↩
Not counting civilizations that might be simulating our world. This matters, but I won’t analyze that here. ↩
There are other possibilities. For example, there could be a global nuclear war that kills all but about 100,000 people, which could set back social, economic, and technological progress by centuries, thus delaying the crucial point in Earth’s history in which it settles into one of the four stable outcomes. ↩
And perhaps also advanced nanotechnology, intelligence amplification technologies, and whole brain emulation. ↩
Thanks to Carl Shulman for making this point. ↩

AI ebook cover design brainstorming

lukeprog — 2013-09-26T23:49:03.319Z

Thanks to everyone who brainstormed possible titles for MIRI’s upcoming ebook on machine intelligence. Our leading contender for the book title is Smarter than Us: The Rise of Machine Intelligence.

What we need now are suggestions for a book cover design. AI is hard to depict without falling back on cliches, such as a brain image mixed with computer circuitry, a humanoid robot, HAL, an imitation of Creation of Adam with human and robot fingers touching, or an imitation of March of Progress with an AI at the far right.

A few ideas/examples:

Something that conveys ‘AI’ in the middle (a computer screen? a server tower?) connected by arrow/wires/something to various ‘skills/actions/influences’, like giving a speech, flying unmanned spacecraft, doing science, predicting the stock market, etc., in an attempt to convey the diverse superpowers of a machine intelligence.
A more minimalist text-only cover.
A fairly minimal cover with just an ominous-looking server rack in the middle, with a few blinking lights and submerged in darkness around it. A bit like this cover.
Similar to the above, except a server farm along the bottom fading into the background, with a frame composition similar to this.
A darkened, machine-gunned room with a laptop sitting alone on a desk, displaying the text of the title on the screen. (This is the scene from the first chapter, about a Terminator who encounters an unthreatening-looking laptop which ends up being way more powerful and dangerous than the Terminator because it is more intelligent.)

Alex Vermeer sketched the first four of these ideas:

Some general inspiration may be found here.

We think we want something kinda dramatic, rather than cartoony, but less epic and unbelievable than the Facing the Intelligence Explosion cover.

Thoughts?

Help us Optimize the Contents of the Sequences eBook

lukeprog — 2013-09-19T04:31:20.391Z

MIRI's ongoing effort to publish the sequences as an eBook has given us the opportunity to update their contents and organization.

We're looking for suggested posts to reorder, add, or remove.

To help with this, here is a breakdown of the current planned contents of the eBook and any currently planned modifications. Following that is a list of the most popular links within the sequences to posts that are not included therein.

Now's a good time to suggested changes or improvements!

———

Map and Territory

Added …What's a Bias Again? because it's meant to immediately follow Why Truth, And….

Mysterious Answers to Mysterious Questions

No changes.

A Human's Guide to Words

No changes.

How to Actually Change Your Mind

Politics is the Mind-Killer

Removed The Robbers Cave Experiment because it already appears in Death Spirals and the Cult Attractor, and there in the original chronological order which flows better.

Death Spirals and the Cult Attractor

Removed The Litany Against Gurus because it already appears in Politics is the Mind-killer.

Seeing with Fresh Eyes

Removed Asch's Conformity Experiment and Lonely Dissent because they both appear at the end of Death Spirals. Removed The Genetic Fallacy because it's in the Metaethics sequence: that's where it falls chronologically and it fits better there with the surrounding posts.

Noticing Confusion

Removed this entire subsequence because it is entirely contained within Mysterious Answers to Mysterious Questions.

Against Rationalization

Added Pascal's Mugging (before Torture vs Dust Specks) because it explains the 3^^^3 notation. Added Torture vs Dust Specks before A Case Study of Motivated Continuation because A Case Study refers to it frequently.

Against Doublethink

No changes.

Overly Convenient Excuses

Removed How to Convince Me that 2+2=3 because it's already in Map & Territory.

Letting Go

No change.

The Simple Math of Evolution

Added Evolutionary Psychology because it fits nicely at the end and it's referred to by other posts many times.

Challenging the Difficult

No change.

Yudkowsky's Coming of Age

No change.

Reductionism

No change. (Includes the Zombies subsequence.)

Quantum Physics

No change. Doesn't include any "Preliminaries" posts, since they'd all be duplicates

Metaethics

No change.

Fun Theory

No change.

The Craft and the Community

No change.

Appendix

Includes:

———

Here are the most-frequently-referenced links within the sequences to posts outside of the sequences (with a count of three or more). This may help you notice posts that you think should be included in the sequences eBook.

Newcomb's Problem and Regret of Rationality => 24
The Second Law of Thermodynamics, and Engines of Cognition => 22
Terminal Values and Instrumental Values => 16
Burdensome Details => 16
Expecting Short Inferential Distances => 15
Thou Art Godshatter => 14
Religion's Claim to be Non-Disprovable => 14
Scope Insensitivity => 13
The Ultimate Source => 13
No One Knows What Science Doesn't Know => 12
The Design Space of Minds-In-General => 11
Think Like Reality => 10
Passing the Recursive Buck => 9
Lost Purposes => 9
The Hidden Complexity of Wishes => 9
Scientific Evidence, Legal Evidence, Rational Evidence => 9
A Priori => 8
Beautiful Probability => 8
Possibility and Could-ness => 8
Why is the Future So Absurd? => 8
Fake Utility Functions => 8
Availability => 7
Ghosts in the Machine => 7
Nonsentient Optimizers => 7
Fake Fake Utility Functions => 7
Searching for Bayes-Structure => 7
Outside the Laboratory => 7
Dreams of AI Design => 6
Surface Analogies and Deep Causes => 6
Artificial Addition => 6
Not for the Sake of Happiness (Alone) => 6
Superstimuli and the Collapse of Western Civilization => 5
Decoherence is Falsifiable and Testable => 5
The Cartoon Guide to Löb's Theorem => 5
Can't Unbirth a Child => 5
The Psychological Unity of Humankind => 5
Humans in Funny Suits => 5
Rationality is Systematized Winning => 5
The True Prisoner's Dilemma => 5
Zen and the Art of Rationality => 5
The "Intuitions" Behind "Utilitarianism" => 5
For The People Who Are Still Alive => 4
The Two-Party Swindle => 4
Conjunction Fallacy => 4
Anthropomorphic Optimism => 4
The Modesty Argument => 4
Rational evidence => 4
Priors as Mathematical Objects => 4
The Unfinished Mystery of the Shangri-La Diet/ => 4
I Defy the Data! => 4
Bystander Apathy => 3
We Don't Really Want Your Participation => 3
You Only Live Twice => 3
Lawful Creativity => 3
One Life Against the World => 3
Locate the hypothesis => 3
Cynical About Cynicism => 3
Optimization => 3
Illusion of Transparency: Why No One Understands You => 3
Detached Lever Fallacy => 3
Circular Altruism => 3
The Allais Paradox => 3
The Martial Art of Rationality => 3
Fake Morality => 3

Suggestions?

Help us name a short primer on AI risk!

lukeprog — 2013-09-17T20:35:34.895Z

MIRI will soon publish a short book by Stuart Armstrong on the topic of AI risk. The book is currently titled “AI-Risk Primer” by default, but we’re looking for something a little more catchy (just as we did for the upcoming Sequences ebook).

The book is meant to be accessible and avoids technical jargon. Here is the table of contents and a few snippets from the book, to give you an idea of the content and style:

Terminator versus the AI
Strength versus Intelligence
What Is Intelligence? Can We Achieve It Artificially?
How Powerful Could AIs Become?
Talking to an Alien Mind
Our Values Are Complex and Fragile
What, Precisely, Do We Really (Really) Want?
We Need to Get It All Exactly Right
Listen to the Sound of Absent Experts
A Summary
That’s Where You Come In …

The Terminator is a creature from our primordial nightmares: tall, strong, aggressive, and nearly indestructible. We’re strongly primed to fear such a being—it resembles the lions, tigers, and bears that our ancestors so feared when they wandered alone on the savanna and tundra.

…

As a species, we humans haven’t achieved success through our natural armor plating, our claws, our razor-sharp teeth, or our poison-filled stingers. Though we have reasonably efficient bodies, it’s our brains that have made the difference. It’s through our social, cultural, and technological intelligence that we have raised ourselves to our current position.

…

Consider what would happen if an AI ever achieved the ability to function socially—to hold conversations with a reasonable facsimile of human fluency. For humans to increase their social skills, they need to go through painful trial and error processes, scrounge hints from more articulate individuals or from television, or try to hone their instincts by having dozens of conversations. An AI could go through a similar process, undeterred by social embarrassment, and with perfect memory. But it could also sift through vast databases of previous human conversations, analyze thousands of publications on human psychology, anticipate where conversations are leading many steps in advance, and always pick the right tone and pace to respond with. Imagine a human who, every time they opened their mouth, had spent a solid year to ponder and research whether their response was going to be maximally effective. That is what a social AI would be like.

So, title suggestions?

Help MIRI run its Oxford UK workshop in November

lukeprog — 2013-09-15T03:13:36.553Z

This November 23-29, MIRI is running its first European research workshop, at Oxford University.

We need somebody familiar with Oxford UK to (1) help us locate and secure lodging for the workshop participants ahead of time, (2) order food for delivery during the workshop, and (3) generally handle on-the-ground logistics.

Apply here for the chance to:

Work with, and hang out with, MIRI staff.
Spend some time (during breaks) with the workshop participants.
Help MIRI work towards its goals.

You can either volunteer to help us for free, or indicate how much you'd need to be paid per hour to take the job.

How well will policy-makers handle AGI? (initial findings)

lukeprog — 2013-09-12T07:21:30.255Z

Cross-posted from MIRI's blog.

MIRI's mission is "to ensure that the creation of smarter-than-human intelligence has a positive impact." One policy-relevant question is: How well should we expect policy makers to handle the invention of AGI, and what does this imply about how much effort to put into AGI risk mitigation vs. other concerns?

To investigate these questions, we asked Jonah Sinick to examine how well policy-makers handled past events analogous in some ways to the future invention of AGI, and summarize his findings. We pre-committed to publishing our entire email exchange on the topic (with minor editing), just as with our project on how well we can plan for future decades. The post below is a summary of findings from our full email exchange (.docx) so far.

As with our investigation of how well we can plan for future decades, we decided to publish our initial findings after investigating only a few historical cases. This allows us to gain feedback on the value of the project, as well as suggestions for improvement, before continuing. It also means that we aren't yet able to draw any confident conclusions about our core questions.

The most significant results from this project so far are:

We came up with a preliminary list of 6 seemingly-important ways in which a historical case could be analogous to the future invention of AGI, and evaluated several historical cases on these criteria.
Climate change risk seems sufficiently disanalogous to AI risk that studying climate change mitigation efforts probably gives limited insight into how well policy-makers will deal with AGI risk: the expected damage of climate change appears to be very small relative to the the expected damage due to AI risk, especially when one looks at expected damage to policy makers.
The 2008 financial crisis appears, after a shallow investigation, to be sufficiently analogous to AGI risk that it should give us some small reason to be concerned that policy-makers will not manage the invention of AGI wisely.
The risks to critical infrastructure from geomagnetic storms are far too small to be in the same reference class with risks from AGI.
The eradication of smallpox is only somewhat analogous to the invention of AGI.
Jonah performed very shallow investigations of how policy-makers have handled risks from cyberwarfare, chlorofluorocarbons, and the Cuban missile crisis, but these cases need more study before even "initial thoughts" can be given.
We identified additional historical cases that could be investigated in the future.

Further details are given below. For sources and more, please see our full email exchange (.docx).

6 ways a historical case can be analogous to the invention of AGI

In conversation, Jonah and I identified six features of the future invention of AGI that, if largely shared by a historical case, seem likely to allow the historical case to shed light on how well policy-makers will deal with the invention of AGI:

AGI may become a major threat in a somewhat unpredictable time.
AGI may become a threat when the world has very limited experience with it.
A good outcome with AGI may require solving a difficult global coordination problem.
Preparing for the AGI threat adequately may require lots of careful work in advance.
Policy-makers have strong personal incentives to solve the AGI problem.
A bad outcome with AGI would be a global disaster, and a good outcome with AGI would have global humanitarian benefit.

More details on these criteria and their use are given in the second email of our full email exchange.

Risks from climate change

People began to see climate change as a potential problem in the early 1970s, but there was some ambiguity as to whether human activity was causing warming (because of carbon emissions) or cooling (because of smog particles). The first IPCC report was issued in 1990, and stated that were was substantial anthropogenic global warming due to greenhouse gases. By 2001, there was a strong scientific consensus behind this claim. While policy-makers' response to risks from climate change might seem likely to shed light on whether policy-makers will deal wisely with AGI, there are some important disanalogies:

The harms of global warming are expected to fall disproportionately on disadvantaged people in poor countries, not on policy-makers. So policy-makers have much less personal incentive to solve the problem than is the case with AGI.
In the median case, humanitarian losses from global warming seems to be about 20% of GDP per year for the poorest people. In light of anticipated economic development and marginal diminishing utility, this is a much smaller negative humanitarian impact than AGI risk (even ignoring future generations). For example, economist Indur Goklany estimated that "through 2085, only 13% of [deaths] from hunger, malaria, and extreme weather events (including coastal flooding from sea level rise) should be from [global] warming."
Thus, potential analogies to AGI risk come from climate change's tail risk. But there seem to be few credentialed scientists who have views compatible with a prediction that even a temperature increase in the 95th percentile of the probability distribution (by 2100) would do more than just begin to render some regions of Earth uninhabitable.
According to the 5th IPCC, the risk of human extinction from climate change seems very low: "Some thresholds that all would consider dangerous have no support in the literature as having a non-negligible chance of occurring. For instance, a 'runaway greenhouse effect'—analogous to Venus—appears to have virtually no chance of being induced by anthropogenic activities."

The 2008 financial crisis

Jonah did a shallow investigation of the 2008 financial crisis, but the preliminary findings are interesting enough for us to describe them in some detail. Jonah's impressions about the relevance of the 2008 financial crisis to the AGI situation are based on a reading of After the Music Stopped by Alan Blinder, who was the vice chairman of the federal reserve for 1.5 years during the Clinton administration. Naturally, many additional sources should be consulted before drawing firm conclusions about the relevance of policy-makers' handling of the financial crisis to their likelihood of handling AGI wisely.

Blinder's seven main factors leading to the recession are (p. 27):

Inflated asset prices, especially of houses (the housing bubble) but also of certain securities (the bond bubble);
Excessive leverage (heavy borrowing) throughout the financial system and the economy;
Lax financial regulation, both in terms of what the law left unregulated and how poorly the various regulators performed their duties;
Disgraceful banking practices in subprime and other mortgage lending;
The crazy-quilt of unregulated securities and derivatives that were built on these bad mortgages;
The abysmal performance of the statistical rating agencies, which helped the crazy-quilt get stitched together; and
The perverse compensation systems in many financial institutions that created powerful incentives to go for broke.

With these factors in mind, let's look at the strength of the analogy between the 2008 financial crisis and the future invention of AGI:

Almost tautologically, a financial crisis is unexpected, though we do know that financial crises happen with some regularity.
The 2008 financial crisis was not unprecedented in kind, only in degree (in some ways).
Avoiding the 2008 financial crisis would have required solving a difficult national coordination problem, rather than a global coordination problem. Still, this analogy seems fairly strong. As Jonah writes, "While the 2008 financial crisis seems to have been largely US specific (while having broader ramifications), there's a sense in which preventing it would have required solving a difficult coordination problem. The causes of the crisis are diffuse, and responsibility falls on many distinct classes of actors."
Jonah's analysis wasn't deep enough to discern whether the 2008 financial crisis is analogous to the future invention of AGI with regard to how much careful work would have been required in advance to avert the risk.
In contrast with AI risk, the financial crisis wasn't a life or death matter for almost any of the actors involved. Many people in finance didn't have incentives to avert the financial crisis: indeed, some of the key figures involved were rewarded with large bonuses. But it's plausible that government decision makers had incentive to avert a financial crisis for reputational reasons, and many interest groups are adversely affected by financial crises.
Once again, the scale of the financial crisis wasn't on a par with AI risk, but it was closer to that scale than the other risks Jonah looked at in this initial investigation.

Jonah concluded that "the conglomerate of poor decisions [leading up to] the 2008 financial crisis constitute a small but significant challenge to the view that [policy-makers] will successfully address AI risk." His reasons were:

The magnitude of the financial crisis is nontrivial (even if small) compared with the magnitude of the AI risk problem (not counting future generations).
The financial crisis adversely affected a very broad range of people, apparently including a large fraction of those people in positions of power (this seems truer here than in the case of climate change). A recession is bad for most businesses and for most workers. Yet these actors weren't able to recognize the problem, coordinate, and prevent it.
The reasons that policy-makers weren't able to recognize the problem, coordinate, and prevent it seem related to reasons why people might not recognize AI risk as a problem, coordinate, and prevent it. First, several key actors involved seem to have exhibited conspicuous overconfidence and neglect of tail risk (e.g. Summers, etc. ignoring Brooksley Born's warnings about excessive leverage). If true, this shows that people in positions of power are notably susceptible to overconfidence and neglect of tail risk. Avoiding overconfidence and giving sufficient weight to tail risk may be crucial in mitigating AI risk. Second, one gets a sense that bystander effect and tragedy of the commons played a large role in the case of the financial crisis. There are risks that weren't adequately addressed because doing so didn't fall under the purview of any of the existing government agencies. This may have corresponded to a mentality of the type "that's not my job — somebody else can take care of it." If people think that AI risk is large, then they might think "if nobody's going to take care of it then I will, because otherwise I'm going to die." But if people think that AI risk is small, they might think "This probably won't be really bad for me, and even though someone should take care of it, it's not going to be me."

Risks from geomagnetic storms

Large geomagnetic storms like the 1859 Carrington Event are infrequent, but could cause serious damage to satellites and critical infrastructure. See this OECD report for an overview.

Jonah's investigation revealed a wide range in expected losses from geomagnetic storms, from $30 million per year to $30 billion per year. But even this larger number amounts to $1.5 trillion in expected losses over the next 50 years. Compare this with the losses from the 2008 financial crisis (roughly a 1 in 50 years event), which are estimated to be about $13 trillion for Americans alone.

Though serious, the risks from geomagnetic storms appear to be small enough to be disanalogous to the future invention of AGI.

The eradication of smallpox

Smallpox, after killing more than 500 million people over the past several millennia, was eradicated in 1979 after a decades-long global eradication effort. Though a hallmark of successful global coordination, it doesn't seem especially relevant to whether policy-makers will handle the invention of AGI wisely.

Here's how the eradication of smallpox does our doesn't fit our criteria for being analogous to the future invention of AGI:

Smallpox didn't arrive at an unpredictable time; it arrived millennia before the eradication campaign.
The world didn't have experience eradicating a disease before smallpox was eradicated, but a number of nations had eliminated smallpox.
Smallpox eradication required solving a difficult global coordination problem, but in a way disanalogous to the invention of AGI safety (see the other points on this list).
Preparing for smallpox eradication required effort in advance in some sense, but the effort had mostly already been exerted before the campaign was announced.
Nations without smallpox had incentive to eradicate smallpox so that they didn't have to spend money to immunize citizens so that the virus would not be (re)-introduced to their countries. For example, in 1968, the United States spent about $100 million on routine smallpox vaccinations.
Smallpox can be thought of as a global disaster: by 1966, about 2 million people died of smallpox each year.

Shallow investigations of risks from cyberwarfare, chlorofluorocarbons, and the Cuban missile crisis

Jonah's shallow investigation of risks from cyberwarfare revealed that experts disagree significantly about the nature and scope of these risks. It's likely that dozens of hours of research would be required to develop a well-informed model of these risks.

To investigate how policy-makers handled the discovery that chlorofluorocarbons (CFCs) depleted the ozone layer, Jonah summarized the first 100 pages of Ozone Crisis: The 15-Year Evolution of a Sudden Global Emergency (see our full email exchange for the summary). This historical case seems worth investigating further, and may be a case of policy-makers solving a global risk with surprising swiftness, though whether the response was appropriately prompt is debated.

Jonah also did a shallow investigation of the Cuban missile crisis. It's difficult to assess how likely it was for the crisis to escalate into a global nuclear war, but it appears that policy-makers made many poor decisions leading up to and during the Cuban missile crisis (see our full email exchange for a list). Jonah concludes:

even if the probability of the Cuban missile crisis leading to an all out nuclear war was only 1% or so, the risk was still sufficiently great so that the way in which the actors handled the situation is evidence against elites handling the creation of AI well. (This contrasts with the situation with climate change, in that elites had strong personal incentives to avert an all-out nuclear war.)

However, this is only a guess based on a shallow investigation, and should not be taken too seriously before a more thorough investigation of the historical facts can be made.

Additional historical cases that could be investigated

We also identified additional historical cases that could be investigated for potentially informative analogies to the future invention of AGI:

The 2003 Iraq War
The frequency with which dictators are deposed or assassinated due to "unforced errors" they made
Nuclear proliferation
Recombinant DNA
Molecular nanotechnology
Near Earth objects
Pandemics and potential pandemics (e.g. HIV, SARS)

How effectively can we plan for future decades? (initial findings)

lukeprog — 2013-09-04T22:42:05.195Z

Cross-posted from MIRI's blog.

MIRI aims to do research now that increases humanity's odds of successfully managing important AI-related events that are at least a few decades away. Thus, we'd like to know: To what degree can we take actions now that will predictably have positive effects on AI-related events decades from now? And, which factors predict success and failure in planning for decades-distant events that share important features with future AI events?

Or, more generally: How effectively can humans plan for future decades? Which factors predict success and failure in planning for future decades?

To investigate these questions, we asked Jonah Sinick to examine historical attempts to plan for future decades and summarize his findings. We pre-committed to publishing our entire email exchange on the topic (with minor editing), just as Jonah had done previously with GiveWell on the subject of insecticide-treated nets. The post below is a summary of findings from our full email exchange (.docx) so far.

We decided to publish our initial findings after investigating only a few historical cases. This allows us to gain feedback on the value of the project, as well as suggestions for improvement, before continuing. It also means that we aren't yet able to draw any confident conclusions about our core questions.

The most significant results from this project so far are:

Jonah's initial impressions about The Limits to Growth (1972), a famous forecasting study on population and resource depletion, were that its long-term predictions were mostly wrong, and also that its authors (at the time of writing it) didn't have credentials that would predict forecasting success. Upon reading the book, its critics, and its defenders, Jonah concluded that many critics and defenders had seriously misrepresented the book, and that the book itself exhibits high epistemic standards and does not make significant predictions that turned out to be wrong.
Svante Arrhenius (1859-1927) did a surprisingly good job of climate modeling given the limited information available to him, but he was nevertheless wrong about two important policy-relevant factors. First, he failed to predict how quickly carbon emissions would increase. Second, he predicted that global warming would have positive rather than negative humanitarian impacts. If more people had taken Arrhenius' predictions seriously and burned fossil fuels faster for humanitarian reasons, then today's scientific consensus on the effects of climate change suggests that the humanitarian effects would have been negative.
In retrospect, Norbert Wiener's concerns about the medium-term dangers of increased automation appear naive, and it seems likely that even at the time, better epistemic practices would have yielded substantially better predictions.
Upon initial investigation, several historical cases seemed unlikely to shed substantial light on our core questions: Norman Rasmussen's analysis of the safety of nuclear power plants, Leo Szilard's choice to keep secret a patent related to nuclear chain reactions, Cold War planning efforts to win decades later, and several cases of "ethically concerned scientists."
Upon initial investigation, two historical cases seemed like they might shed light on our core questions, but only after many hours of additional research on each of them: China's one-child policy, and the Ford Foundation's impact on India's 1991 financial crisis.
We listed many other historical cases that may be worth investigating.

The project has also produced a chapter-by-chapter list of some key lessons from Nate Silver's The Signal and the Noise, available here.

Further details are given below. For sources and more, please see our full email exchange (.docx).

The Limits to Growth

In his initial look at The Limits to Growth (1972), Jonah noted that the authors were fairly young at the time of writing (the oldest was 31), and they lacked credentials in long-term forecasting. Moreover, it appeared that Limits to Growth predicted a sort of doomsday scenario - ala Ehrlich's The Population Bomb (1968) - that had failed to occur. In particular, it appeared that Limits to Growth had failed to appreciate Julian Simon's point that other resources would substitute for depleted resources. Upon reading the book, Jonah found that:

The book avoids strong, unconditional claims. Its core claim is that if exponential growth of resource usage continues, then there will likely be a societal collapse by 2100.
The book was careful to qualify its claims, and met high epistemic standards. Jonah wrote: "The book doesn't look naive even in retrospect, which is impressive given that it was written 40 years ago. "
The authors discuss substitutability at length in chapter 4.
The book discusses mitigation at a theoretical level, but doesn't give explicit policy recommendations, perhaps because the issues involved were too complex.

Svante Arrhenius

Derived more than a century ago, Svante Arrhenius' equation for how the Earth's temperature varies as a function of concentration of carbon dioxide is the same equation used today. But while Arrhenius' climate modeling was impressive given the information available to him at the time, he failed to predict (by a large margin) how quickly fossil fuels would be burned. He also predicted that global warming would have positive humanitarian effects, but based on our current understanding, the expected humanitarian effects seem negative.

Arrhenius's predictions were mostly ignored at the time, but had people taken them seriously and burned fossil fuels more quickly, the humanitarian effects would probably have been negative.

Norbert Wiener

As Jonah explains, Norbert Wiener (1894-1964) "believed that unless countermeasures were taken, automation would render low skilled workers unemployable. He believed that this would precipitate an economic crisis far worse than that of the Great Depression." Nearly 50 years after his death, this doesn't seem to have happened much, though it may eventually happen.

Jonah's impression is that Wiener had strong views on the subject, doesn't seem to have updated much in response to incoming evidence, and seems to have relied to heavily on what Berlin (1953) and Tetlock (2005) described as "hedgehog" thinking: "the fox knows many things, but the hedgehog knows one big thing."

Some historical cases that seem unlikely to shed light on our questions

Rasmussen (1975) is a probabilistic risk assessment of nuclear power plants, written before any nuclear power plant disasters had occurred. However, Jonah concluded that this historical case wasn't very relevant to our specific questions about taking actions useful for decades-distant AI outcomes, in part because the issue is highly domain specific, and because the report makes a large number of small predictions rather than a few salient predictions.

In 1936, Leó Szilárd assigned his chain reaction patent in a way that ensured it would be kept secret from the Nazis. However, Jonah concluded:

I think that this isn't a good example of a nontrivial future prediction. The destructive potential seems pretty obvious – anything that produces a huge amount of concentrated energy can be used in a destructive way. As for the Nazis, Szilard was himself Jewish and fled from the Nazis, and it seems pretty obvious that one wouldn't want a dangerous regime to acquire knowledge that has destructive potential. It would be more impressive if the early developers of quantum mechanics had kept their research secret on account of dimly being aware of the possibility of destructive potential, or if Szilard had filed his patent secretly in a hypothetical world in which the Nazi regime was years away.

Jonah briefly investigated Cold War efforts aimed at winning the war decades later, but concluded that it was "too difficult to tie these efforts to war outcomes."

Jonah also investigated Kaj Sotala's A brief history of ethically concerned scientists. Most of the historical cases cited there didn't seem relevant to this project. Many cases involved "scientists concealing their discoveries out of concern that they would be used for military purposes," but this seems to be an increasingly irrelevant sort of historical case, since science and technology markets are now relatively efficient, and concealing a discovery rarely delays progress for very long (e.g. see Kelly 2011). Other cases involved efforts to reduce the use of dangerous weapons for which the threat was imminent during the time of the advocacy. There may be lessons among these cases, but they appear to be of relatively weak relevance to our current project.

Some historical cases that might shed light on our questions with much additional research

Jonah performed an initial investigation of the impacts of China's one-child policy, and concluded that it would take many, many hours of research to determine both the sign and the magnitude of the policy's impacts.

Jonah also investigated a case involving the Ford Foundation. In a conversation with GiveWell, Lant Pritchett said:

[One] example of transformative philanthropy is related to India’s recovery from its economic crisis of 1991. Other countries had previously had similar crises and failed to implement good policies that would have allowed them to recover from their crises. By way of contrast, India implemented good policies and recovered in a short time frame. Most of the key actors who ensured that India implemented the policies that it did were influenced by a think tank established by the Ford Foundation ten years before the crisis. The think tank exposed Indians to relevant ideas from the developed world about liberalization. The difference between (a) India’s upward economic trajectory and (b) what its upward economic trajectory would have been if it had been unsuccessful in recovering from the 1991 crisis is in the trillions of dollars. As such, the Ford Foundation’s investment in the think tank had a huge impact. For the ten years preceding the crisis, it looked like the think tank was having no impact, but it turned out to have a huge impact.

Unfortunately, Jonah was unable to find any sources or contacts that would allow him to check whether this story is true.

Other historical cases that might be worth investigating

Historical cases we identified but did not yet investigate include:

Eric Drexler's early predictions about the feasibility and likely effects of nanotechnology.
The Asilomar conference on recombinant DNA
Efforts to detect asteroids before they threaten Earth
The Green Revolution
The modern history of cryptography
Early efforts to mitigate global warming
Possible deliberate long term efforts to produce scientific breakthroughs (the transistor? the human genome?)
Rachel Carson's Silent Spring (1962)
Paul Ehrlich's The Population Bomb (1968)
The Worldwatch Institute's State of the World reports (since 1984)
The WCED's Our Common Future (1987)

Which subreddits should we create on Less Wrong?

lukeprog — 2013-09-04T17:56:33.729Z

Less Wrong is based on reddit code, which means we can create subreddits with relative ease.

Right now we have two subreddits, Main and Discussion. These are distinguished not by subject matter, but by whether a post is the type of thing that might be promoted to the front page or not (e.g. a meetup announcement, or a particularly well-composed and useful post).

As a result, almost everything is published to Discussion, and thus it is difficult for busy people to follow only the subjects they care about. More people will be able to engage if we split things into topic-specific subreddits, and make it easy to follow only what they care about.

To make it easier for people to follow only what they care about, we're building the code for a Dashboard thingie.

But we also need to figure out which subreddits to create, and we'd like community feedback about that.

We'll probably start small, with just 1-5 new subreddits.

Below are some initial ideas, to get the conversation started.

Idea 1

Main: still the place for things that might be promoted.
Applied Rationality: for articles about what Jonathan Baron would call descriptive and prescriptive rationality, for both epistemic and instrumental rationality (stuff about biases, self-improvement stuff, etc.).
Normative Rationality: for articles about what Baron would call normative rationality, for both epistemic and instrumental rationality (examining the foundations of probability theory, decision theory, anthropics, and lots of stuff that is called "philosophy").
The Future: for articles about forecasting, x-risk, and future technologies.
Misc: Discussion, renamed, for everything that doesn't belong in the other subreddits.

Idea 2

Main
Epistemic Rationality: for articles about how to figure out the world, spanning the descriptive, prescriptive, and normative.
Instrumental Rationality: for articles about how to take action to achieve your goals, spanning the descriptive, prescriptive, and normative. (One difficulty with the epistemic/instrumental split is that many (most?) applied rationality techniques seem to be relevant to both epistemic and instrumental rationality.)
The Future
Misc.

Artificial explosion of the Sun: a new x-risk?

lukeprog — 2013-09-02T06:12:39.019Z

Bolonkin & Friedlander (2013) argues that it might be possible for "a dying dictator" to blow up the Sun, and thus destroy all life on Earth:

The Sun contains ~74% hydrogen by weight. The isotope hydrogen-1 (99.985% of hydrogen in nature) is a usable fuel for fusion thermonuclear reactions. This reaction runs slowly within the Sun because its temperature is low (relative to the needs of nuclear reactions). If we create higher temperature and density in a limited region of the solar interior, we may be able to produce self-supporting detonation thermonuclear reactions that spread to the full solar volume. This is analogous to the triggering mechanisms in a thermonuclear bomb. Conditions within the bomb can be optimized in a small area to initiate ignition, then spread to a larger area, allowing producing a hydrogen bomb of any power. In the case of the Sun certain targeting practices may greatly increase the chances of an artificial explosion of the Sun. This explosion would annihilate the Earth and the Solar System, as we know them today. The reader naturally asks: Why even contemplate such a horrible scenario? It is necessary because as thermonuclear and space technology spreads to even the least powerful nations in the centuries ahead, a dying dictator having thermonuclear missile weapons can [produce] (with some considerable mobilization of his military/industrial complex)—an artificial explosion of the Sun and take into his grave the whole of humanity. It might take tens of thousands of people to make and launch the hardware, but only a very few need know the final targeting data of what might be otherwise a weapon purely thought of (within the dictator’s defense industry) as being built for peaceful, deterrent use. Those concerned about Man’s future must know about this possibility and create some protective system—or ascertain on theoretical grounds that it is entirely [impossible]. Humanity has fears, justified to greater or lesser degrees, about asteroids, warming of Earthly climate, extinctions, etc. which have very small probability. But all these would leave survivors—nobody thinks that the terrible annihilation of the Solar System would leave a single person alive. That explosion appears possible at the present time. In this paper is derived the “AB-Criterion” which shows conditions wherein the artificial explosion of Sun is possible. The author urges detailed investigation and proving or disproving of this rather horrifying possibility, so that it may be dismissed from mind—or defended against.

Warning: the paper is published in an obscure journal by publisher #206 on Beall’s List of Predatory Publishers 2013, and I was unable to find confirmation of the authors' claimed credentials from any reputable sources with 5 minutes of Googling. It also has two spelling errors in the abstract. (It has no citations on Google scholar, but I wouldn't expect it to have any since it was only released in July 2013.)

I haven't read the paper, and I'd love to see someone fluent in astrophysics comment on its contents.

My guess is that this is not a risk at all or, as with proposed high-energy physics disasters, the risk is extremely low-probability but physically conceivable (though perhaps not by methods imagined by Bolonkin & Friedlander).

Transparency in safety-critical systems

lukeprog — 2013-08-25T18:52:07.757Z

I've just posted an analysis to MIRI's blog called Transparency in Safety-Critical Systems. Its aim is to explain a common view about transparency and system reliability, and then open a dialogue about which parts of that view are wrong, or don't apply well to AGI.

The "common view" (not universal by any means) explained in the post is, roughly:

Black box testing can provide some confidence that a system will behave as intended, but if a system is built such that it is transparent to human inspection, then additional methods of reliability verification are available. Unfortunately, many of AI’s most useful methods are among its least transparent. Logic-based systems are typically more transparent than statistical methods, but statistical methods are more widely used. There are exceptions to this general rule, and some people are working to make statistical methods more transparent.

Three caveats / open problems listed at the end of the post are:

How does the transparency of a method change with scale? A 200-rules logical AI might be more transparent than a 200-node Bayes net, but what if we’re comparing 100,000 rules vs. 100,000 nodes? At least we can query the Bayes net to ask “what it believes about X,” whereas we can’t necessarily do so with the logic-based system.
Do the categories above really “carve reality at its joints” with respect to transparency? Does a system’s status as a logic-based system or a Bayes net reliably predict its transparency, given that in principle we can use either one to express a probabilistic model of the world?
How much of a system’s transparency is “intrinsic” to the system, and how much of it depends on the quality of the user interface used to inspect it? How much of a “transparency boost” can different kinds of systems get from excellently designed user interfaces?

The MIRI blog has only recently begun to regularly host substantive, non-news content, so it doesn't get much commenting action yet. Thus, I figured I'd post here and try to start a dialogue. Comment away!

How Efficient is the Charitable Market?

lukeprog — 2013-08-24T05:57:48.169Z

When I talk about the poor distribution of funds in charity, people in the effective altruism movement sometimes say, "Didn't Holden Karnofsky show that charity is an efficient market in his post Broad Market Efficiency?"

My reply is "No. Holden never said, and doesn't believe, that charity is an efficient market."

What is an efficient market?

An efficient market is one in which "one cannot consistently achieve returns in excess of average market returns... given the information available at the time the investment is made." (Details here.)

Of course, market efficiency is a spectrum, not a yes/no question. As Holden writes, "The most efficient markets can be consistently beaten only by the most talented/dedicated players, while the least efficient [markets] can be beaten with fairly little in the way of talent and dedication."

Moreover, market efficiency is multi-dimensional. Any particular market may be efficient in some ways, and in some domains, while highly inefficient in other ways and other domains.

Charity as an inefficient market

Financial markets are relatively efficient. It's rare for players to consistently beat the market by a large margin. You can beat the average by investing in a low-fee index fund, but not by a lot, and it's hard to beat hedge funds.

Philanthropic markets appear to be less efficient than financial markets in many ways. In charity, one can consistently beat the market by a wide margin simply by giving to GiveWell's recommended charities, which achieve far greater returns (in social value) per marginal dollar than the average charity does. However, Holden points out that it has been surprisingly difficult for GiveWell to find ways to beat well-run large foundations like the Gates Foundation's work in global health.

Why should we expect charity to be less efficient than financial markets?

For one thing, most people giving to charity don't even seem to care what returns (in social value) they're getting with their investments. That's why, when proto-GiveWell initially contacted a bunch of charities to ask for evidence of positive impact, some of those charities reported that nobody who gave them money had ever asked that question before. And when charities sent proto-GiveWell their internal reports about effectiveness, they were so inadequate that they "led [proto-GiveWell] to understand that the charities themselves did not know whether they were helping or hurting a given situation" (Stern 2012).

For another thing, "market incentives of the nonprofit world push charities toward happy anecdote and inspiring narrative rather than toward careful planning, research, and evidence-based investments" (details here).

Also, as Brian Tomasik notes, "Efficiency in the realm of charity is inherently less plausible than in financial markets because in charity there’s not a common unit of what 'good' means... Indeed, one man’s good may be another man’s bad (e.g., abortion, gun control, extinction risks)." But even when we focus on relatively common units of 'good' (e.g. human welfare, QALYs, or DALYs), charity is still relatively inefficient: we can easily purchase more QALYs per dollar via AMF than via, say, the popular Make-a-Wish Foundation.

What is "broad market efficiency", then?

If Holden agrees that philanthropic markets are relatively inefficient in the sense that it's easy to consistently and substantially beat average market returns by giving to GiveWell's recommended charities, then what does he mean by "broad market efficiency"? Holden introduces "broad market efficiency" as a term for the spectrum of market efficiency, but remains uncertain as to where charity falls on that spectrum of market efficiency.

Brian Tomasik worried that the term "broad market efficiency" would confuse some readers into thinking Holden was claiming that philanthropic markets are relatively efficient and thus that "it doesn’t really matter where you donate." Holden said he wasn't worried about this, saying, "I don’t think 'broad market efficiency' is a common phrase or one with a clear meaning." But I think the phrase is confusing, that many readers interpret it as meaning "market efficiency," and indeed that people in economics and finance sometimes use it that way: search for the phrase "broad market efficiency" here, here, here, and here.

The research ahead

So how efficient is the charitable market, and in which ways? My own guess is that it's far less efficient than financial markets, but GiveWell's research has provided valuable and surprising (to me) information on this topic, and I look forward to future discoveries.

Engaging Intellectual Elites at Less Wrong

lukeprog — 2013-08-13T17:55:05.719Z

Is Less Wrong, despite its flaws, the highest-quality relatively-general-interest forum on the web? It seems to me that, to find reliably higher-quality discussion, I must turn to more narrowly focused sites, e.g. MathOverflow and the GiveWell blog.

Many people smarter than myself have reported the same impression. But if you know of any comparably high-quality relatively-general-interest forums, please link me to them!

In the meantime: suppose it's true that Less Wrong is the highest-quality relatively-general-interest forum on the web. In that case, we're sitting on a big opportunity to grow Less Wrong into the "standard" general-interest discussion hub for people with high intelligence and high metacognition (shorthand: "intellectual elites").

Earlier, Jonah Sinick lamented the scarcity of elites on the web. How can we get more intellectual elites to engage on the web, and in particular at Less Wrong?

Some projects to improve the situation are extremely costly:

Pay some intellectual elites with unusually good writing skills (like Eliezer) to generate a constant stream of new, interesting content.
Comb through Less Wrong to replace community-specific jargon with more universally comprehensible terms, and change community norms about jargon. (E.g. GiveWell's jargon tends to be more transparent, such as their phrase "room for more funding.")

Code changes, however, could be significantly less costly. New features or site structure elements could increase engagement by intellectual elites. (To avoid priming and contamination, I'll hold back from naming specific examples here.)

To help us figure out which code changes are most likely to increase engagement on Less Wrong by intellectual elites, specific MIRI volunteers will be interviewing intellectual elites who (1) are familiar enough with Less Wrong to be able to simulate which code changes might cause them to engage more, but who (2) mostly just lurk, currently.

In the meantime, I figured I'd throw these ideas to the community for feedback and suggestions.

How to Measure Anything

lukeprog — 2013-08-07T04:05:58.366Z

Douglas Hubbard’s How to Measure Anything is one of my favorite how-to books. I hope this summary inspires you to buy the book; it’s worth it.

The book opens:

Anything can be measured. If a thing can be observed in any way at all, it lends itself to some type of measurement method. No matter how “fuzzy” the measurement is, it’s still a measurement if it tells you more than you knew before. And those very things most likely to be seen as immeasurable are, virtually always, solved by relatively simple measurement methods.

The sciences have many established measurement methods, so Hubbard’s book focuses on the measurement of “business intangibles” that are important for decision-making but tricky to measure: things like management effectiveness, the “flexibility” to create new products, the risk of bankruptcy, and public image.

Basic Ideas

A measurement is an observation that quantitatively reduces uncertainty. Measurements might not yield precise, certain judgments, but they do reduce your uncertainty.

To be measured, the object of measurement must be described clearly, in terms of observables. A good way to clarify a vague object of measurement like “IT security” is to ask “What is IT security, and why do you care?” Such probing can reveal that “IT security” means things like a reduction in unauthorized intrusions and malware attacks, which the IT department cares about because these things result in lost productivity, fraud losses, and legal liabilities.

Uncertainty is the lack of certainty: the true outcome/state/value is not known.

Risk is a state of uncertainty in which some of the possibilities involve a loss.

Much pessimism about measurement comes from a lack of experience making measurements. Hubbard, who is far more experienced with measurement than his readers, says:

Your problem is not as unique as you think.
You have more data than you think.
You need less data than you think.
An adequate amount of new data is more accessible than you think.

Applied Information Economics

Hubbard calls his method “Applied Information Economics” (AIE). It consists of 5 steps:

Define a decision problem and the relevant variables. (Start with the decision you need to make, then figure out which variables would make your decision easier if you had better estimates of their values.)
Determine what you know. (Quantify your uncertainty about those variables in terms of ranges and probabilities.)
Pick a variable, and compute the value of additional information for that variable. (Repeat until you find a variable with reasonably high information value. If no remaining variables have enough information value to justify the cost of measuring them, skip to step 5.)
Apply the relevant measurement instrument(s) to the high-information-value variable. (Then go back to step 3.)
Make a decision and act on it. (When you’ve done as much uncertainty reduction as is economically justified, it’s time to act!)

These steps are elaborated below.

Step 1: Define a decision problem and the relevant variables

Hubbard illustrates this step by telling the story of how he helped the Department of Veterans Affairs (VA) with a measurement problem.

The VA was considering seven proposed IT security projects. They wanted to know “which… of the proposed investments were justified and, after they were implemented, whether improvements in security justified further investment…” Hubbard asked his standard questions: “What do you mean by ‘IT security’? Why does it matter to you? What are you observing when you observe improved IT security?”

It became clear that nobody at the VA had thought about the details of what “IT security” meant to them. But after Hubbard’s probing, it became clear that by “IT security” they meant a reduction in the frequency and severity of some undesirable events: agency-wide virus attacks, unauthorized system access (external or internal),unauthorized physical access, and disasters affecting the IT infrastructure (fire, flood, etc.) And each undesirable event was on the list because of specific costs associated with it: productivity losses from virus attacks, legal liability from unauthorized system access, etc.

Now that the VA knew what they meant by “IT security,” they could measure specific variables, such as the number of virus attacks per year.

Step 2: Determine what you know

Uncertainty and calibration

The next step is to determine your level of uncertainty about the variables you want to measure. To do this, you can express a “confidence interval” (CI). A 90% CI is a range of values that is 90% likely to contain the correct value. For example, the security experts at the VA were 90% confident that each agency-wide virus attack would affect between 25,000 and 65,000 people.

Unfortunately, few people are well-calibrated estimators. For example in some studies, the true value lay in subjects’ 90% CIs only 50% of the time! These subjects were overconfident. For a well-calibrated estimator, the true value will lie in her 90% CI roughly 90% of the time.

Luckily, “assessing uncertainty is a general skill that can be taught with a measurable improvement.”

Hubbard uses several methods to calibrate each client’s value estimators, for example the security experts at the VA who needed to estimate the frequency of security breaches and their likely costs.

His first technique is the equivalent bet test. Suppose you’re asked to give a 90% CI for the year in which Newton published the universal laws of gravitation, and you can win $1,000 in one of two ways:

You win $1,000 if the true year of publication falls within your 90% CI. Otherwise, you win nothing.
You spin a dial divided into two “pie slices,” one covering 10% of the dial, and the other covering 90%. If the dial lands on the small slice, you win nothing. If it lands on the big slice, you win $1,000.

If you find yourself preferring option #2, then you must think spinning the dial has a higher chance of winning you $1,000 than option #1. That suggest your stated 90% CI isn’t really your 90% CI. Maybe it’s your 65% CI or your 80% CI instead. By preferring option #2, your brain is trying to tell you that your originally stated 90% CI is overconfident.

If instead you find yourself preferring option #1, then you must think there is more than a 90% chance your stated 90% CI contains the true value. By preferring option #1, your brain is trying to tell you that your original 90% CI is under confident.

To make a better estimate, adjust your 90% CI until option #1 and option #2 seem equally good to you. Research suggests that even pretending to bet money in this way will improve your calibration.

Hubbard’s second method for improving calibration is simply repetition and feedback. Make lots of estimates and then see how well you did. For this, play CFAR’s Calibration Game.

Hubbard also asks people to identify reasons why a particular estimate might be right, and why it might be wrong.

He also asks people to look more closely at each bound (upper and lower) on their estimated range. A 90% CI “means there is a 5% chance the true value could be greater than the upper bound, and a 5% chance it could be less than the lower bound. This means the estimators must be 95% sure that the true value is less than the upper bound. If they are not that certain, they should increase the upper bound… A similar test is applied to the lower bound.”

Simulations

Once you determine what you know about the uncertainties involved, how can you use that information to determine what you know about the risks involved? Hubbard summarizes:

…all risk in any project… can be expressed by one method: the ranges of uncertainty on the costs and benefits, and probabilities on events that might affect them.

The simplest tool for measuring such risks accurately is the Monte Carlo (MC) simulation, which can be run by Excel and many other programs. To illustrate this tool, suppose you are wondering whether to lease a new machine for one step in your manufacturing process.

The one-year lease [for the machine] is $400,000 with no option for early cancellation. So if you aren’t breaking even, you are still stuck with it for the rest of the year. You are considering signing the contract because you think the more advanced device will save some labor and raw materials and because you think the maintenance cost will be lower than the existing process.

Your pre-calibrated estimators give their 90% CIs for the following variables:

Maintenance savings (MS): $10 to $20 per unit
Labor savings (LS): -$2 to $8 per unit
Raw materials savings (RMS): $3 to $9 per unit
Production level (PL): 15,000 to 35,000 units per year

Thus, your annual savings will equal (MS + LS + RMS) × PL.

When measuring risk, we don’t just want to know the “average” risk or benefit. We want to know the probability of a huge loss, the probability of a small loss, the probability of a huge savings, and so on. That’s what Monte Carlo can tell us.

An MC simulation uses a computer to randomly generate thousands of possible values for each variable, based on the ranges we’ve estimated. The computer then calculates the outcome (in this case, the annual savings) for each generated combination of values, and we’re able to see how often different kinds of outcomes occur.

To run an MC simulation we need not just the 90% CI for each variable but also the shape of each distribution. In many cases, the normal distribution will work just fine, and we’ll use it for all the variables in this simplified illustration. (Hubbard’s book shows you how to work with other distributions).

To make an MC simulation of a normally distributed variable in Excel, we use this formula:

=norminv(rand(), mean, standard deviation)

So the formula for the maintenance savings variable should be:

=norminv(rand(), 15, (20–10)/3.29)

Suppose you enter this formula on cell A1 in Excel. To generate (say) 10,000 values for the maintenance savings value, just (1) copy the contents of cell A1, (2) enter “A1:A10000” in the cell range field to select cells A1 through A10000, and (3) paste the formula into all those cells.

Now we can follow this process in other columns for the other variables, including a column for the “total savings” formula. To see how many rows made a total savings of $400,000 or more (break-even), use Excel’s countif function. In this case, you should find that about 14% of the scenarios resulted in a savings of less than $400,000 – a loss.

We can also make a histogram (see right) to show how many of the 10,000 scenarios landed in each $100,000 increment (of total savings). This is even more informative, and tells us a great deal about the distribution of risk and benefits we might incur from investing in the new machine. (Download the full spreadsheet for this example here.)

The simulation concept can (and in high-value cases should) be carried beyond this simple MC simulation. The first step is to learn how to use a greater variety of distributions in MC simulations. The second step is to deal with correlated (rather than independent) variables by generating correlated random numbers or by modeling what the variables have in common.

A more complicated step is to use a Markov simulation, in which the simulated scenario is divided into many time intervals. This is often used to model stock prices, the weather, and complex manufacturing or construction projects. Another more complicated step is to use an agent-based model, in which independently-acting agents are simulated. This method is often used for traffic simulations, in which each vehicle is modeled as an agent.

Step 3: Pick a variable, and compute the value of additional information for that variable

Information can have three kinds of value:

Information can affect people’s behavior (e.g. common knowledge of germs affects sanitation behavior).
Information can have its own market value (e.g. you can sell a book with useful information).
Information can reduce uncertainty about important decisions. (This is what we’re focusing on here.)

When you’re uncertain about a decision, this means there’s a chance you’ll make a non-optimal choice. The cost of a “wrong” decision is the difference between the wrong choice and the choice you would have made with perfect information. But it’s too costly to acquire perfect information, so instead we’d like to know which decision-relevant variables are the most valuable to measure more precisely, so we can decide which measurements to make.

Here’s a simple example:

Suppose you could make $40 million profit if [an advertisement] works and lose $5 million (the cost of the campaign) if it fails. Then suppose your calibrated experts say they would put a 40% chance of failure on the campaign.

The expected opportunity loss (EOL) for a choice is the probability of the choice being “wrong” times the cost of it being wrong. So for example the EOL if the campaign is approved is $5M × 40% = $2M, and the EOL if the campaign is rejected is $40M × 60% = $24M.

The difference between EOL before and after a measurement is called the “expected value of information” (EVI).

In most cases, we want to compute the VoI for a range of values rather than a binary succeed/fail. So let’s tweak the advertising campaign example and say that a calibrated marketing expert’s 90% CI for sales resulting from the campaign was from 100,000 units to 1 million units. The risk is that we don’t sell enough units from this campaign to break even.

Suppose we profit by $25 per unit sold, so we’d have to sell at least 200,000 units from the campaign to break even (on a $5M campaign). To begin, let’s calculate the expected value of perfect information (EVPI), which will provide an upper bound on how much we should spend to reduce our uncertainty about how many units will be sold as a result of the campaign. Here’s how we compute it:

Slice the distribution of our variable into thousands of small segments.
Compute the EOL for each segment. EOL = segment midpoint times segment probability.
Sum the products from step 2 for all segments.

Of course, we’ll do this with a computer. For the details, see Hubbard’s book and the Value of Information spreadsheet from his website.

In this case, the EVPI turns out to be about $337,000. This means that we shouldn’t spend more than $337,000 to reduce our uncertainty about how many units will be sold as a result of the campaign.

And in fact, we should probably spend much less than $337,000, because no measurement we make will give us perfect information. For more details on how to measure the value of imperfect information, see Hubbard’s book and these three LessWrong posts: (1) VoI: 8 Examples, (2) VoI: Four Examples, and (3) 5-second level case study: VoI.

I do, however, want to quote Hubbard’s comments about the “measurement inversion”:

By 1999, I had completed the… Applied Information Economics analysis on about 20 major [IT] investments… Each of these business cases had 40 to 80 variables, such as initial development costs, adoption rate, productivity improvement, revenue growth, and so on. For each of these business cases, I ran a macro in Excel that computed the information value for each variable… [and] I began to see this pattern: * The vast majority of variables had an information value of zero… * The variables that had high information values were routinely those that the client had never measured… * The variables that clients [spent] the most time measuring were usually those with a very low (even zero) information value… …since then, I’ve applied this same test to another 40 projects, and… [I’ve] noticed the same phenomena arise in projects relating to research and development, military logistics, the environment, venture capital, and facilities expansion.

Hubbard calls this the “Measurement Inversion”:

In a business case, the economic value of measuring a variable is usually inversely proportional to how much measurement attention it usually gets.

Here is one example:

A stark illustration of the Measurement Inversion for IT projects can be seen in a large UK-based insurance client of mine that was an avid user of a software complexity measurement method called “function points.” This method was popular in the 1980s and 1990s as a basis of estimating the effort for large software development efforts. This organization had done a very good job of tracking initial estimates, function point estimates, and actual effort expended for over 300 IT projects. The estimation required three or four full-time persons as “certified” function point counters…

But a very interesting pattern arose when I compared the function point estimates to the initial estimates provided by project managers… The costly, time-intensive function point counting did change the initial estimate but, on average, it was no closer to the actual project effort than the initial effort… Not only was this the single largest measurement effort in the IT organization, it literally added no value since it didn’t reduce uncertainty at all. Certainly, more emphasis on measuring the benefits of the proposed projects – or almost anything else – would have been better money spent.

Hence the importance of calculating EVI.

Step 4: Apply the relevant measurement instrument(s) to the high-information-value variable

If you followed the first three steps, then you’ve defined a variable you want to measure in terms of the decision it affects and how you observe it, you’ve quantified your uncertainty about it, and you’ve calculated the value of gaining additional information about it. Now it’s time to reduce your uncertainty about the variable – that is, to measure it.

Each scientific discipline has its own specialized measurement methods. Hubbard’s book describes measurement methods that are often useful for reducing our uncertainty about the “softer” topics often encountered by decision-makers in business.

Selecting a measurement method

To figure out which category of measurement methods are appropriate for a particular case, we must ask several questions:

Decomposition: Which parts of the thing are we uncertain about?
Secondary research: How has the thing (or its parts) been measured by others?
Observation: How do the identified observables lend themselves to measurement?
Measure just enough: How much do we need to measure it?
Consider the error: How might our observations be misleading?

Decomposition

Sometimes you’ll want to start by decomposing an uncertain variable into several parts to identify which observables you can most easily measure. For example, rather than directly estimating the cost of a large construction project, you could break it into parts and estimate the cost of each part of the project.

In Hubbard’s experience, it’s often the case that decomposition itself – even without making any new measurements – often reduces one’s uncertainty about the variable of interest.

Secondary research

Don’t reinvent the world. In almost all cases, someone has already invented the measurement tool you need, and you just need to find it. Here are Hubbard’s tips on secondary research:

If you’re new to a topic, start with Wikipedia rather than Google. Wikipedia will give you a more organized perspective on the topic at hand.
Use search terms often associated with quantitative data. E.g. don’t just search for “software quality” or “customer perception” – add terms like “table,” “survey,” “control group,” and “standard deviation.”
Think of internet research in two levels: general search engines and topic-specific repositories (e.g. the CIA World Fact Book).
Try multiple search engines.
If you find marginally related research that doesn’t directly address your topic of interest, check the bibliography more relevant reading material.

I’d also recommend my post Scholarship: How to Do It Efficiently.

Observation

If you’re not sure how to measure your target variable’s observables, ask these questions:

Does it leave a trail? Example: longer waits on customer support lines cause customers to hang up and not call back. Maybe you can also find a correlation between customers who hang up after long waits and reduced sales to those customers.
Can you observe it directly? Maybe you haven’t been tracking how many of the customers in your parking lot show an out-of-state license, but you could start. Or at least, you can observe a sample of these data.
Can you create a way to observe it indirectly? Amazon.com added a gift-wrapping feature in part so they could better track how many books were being purchased as gifts. Another example is when consumers are given coupons so that retailers can see which newspapers their customers read.
Can the thing be forced to occur under new conditions which allow you to observe it more easily? E.g. you could implement a proposed returned-items policy in some stores but not others and compare the outcomes.

Measure just enough

Because initial measurements often tell you quite a lot, and also change the value of continued measurement, Hubbard often aims for spending 10% of the EVPI on a measurement, and sometimes as little as 2% (especially for very large projects).

Consider the error

It’s important to be conscious of some common ways in which measurements can mislead.

Scientists distinguish two types of measurement error: systemic and random. Random errors are random variations from one observation to the next. They can’t be individually predicted, but they fall into patterns that can be accounted for with the laws of probability. Systemic errors, in contrast, are consistent. For example, the sales staff may routinely overestimate the next quarter’s revenue by 50% (on average).

We must also distinguish precision and accuracy. A “precise” measurement tool has low random error. E.g. if a bathroom scale gives the exact same displayed weight every time we set a particular book on it, then the scale has high precision. An “accurate” measurement tool has low systemic error. The bathroom scale, while precise, might be inaccurate if the weight displayed is systemically biased in one direction – say, eight pounds too heavy. A measurement tool can also have low precision but good accuracy, if it gives inconsistent measurements but they average to the true value.

Random error tends to be easier to handle. Consider this example:

For example, to determine how much time sales reps spend in meetings with clients versus other administrative tasks, they might choose a complete review of all time sheets… [But] if a complete review of 5,000 time sheets… tells us that sales reps spend 34% of their time in direct communication with customers, we still don’t know how far from the truth it might be. Still, this “exact” number seems reassuring to many managers. Now, suppose a sample of direct observations of randomly chosen sales reps at random points in time finds that sales reps were in client meetings or on client phone calls only 13 out of 100 of those instances. (We can compute this without interrupting a meeting by asking as soon as the rep is available.) As we will see [later], in the latter case, we can statistically compute a 90% CI to be 7.5% to 18.5%. Even though this random sampling approach gives us only a range, we should prefer its findings to the census audit of time sheets. The census… gives us an exact number, but we have no way to know by how much and in which direction the time sheets err.

Systemic error is also called a “bias.” Based on his experience, Hubbard suspects the three most important to avoid are:

Confirmation bias: people see what they want to see.
Selection bias: your sample might not be representative of the group you’re trying to measure.
Observer bias: the very act of observation can affect what you observe. E.g. in one study, researchers found that worker productivity improved no matter what they changed about the workplace. The workers seem to have been responding merely to the fact that they were being observed in some way.

Choose and design the measurement instrument

After following the above steps, Hubbard writes, “the measurement instrument should be almost completely formed in your mind.” But if you still can’t come up with a way to measure the target variable, here are some additional tips:

Work through the consequences. If the value is surprisingly high, or surprisingly low, what would you expect to see?
Be iterative. Start with just a few observations, and then recalculate the information value.
Consider multiple approaches. Your first measurement tool may not work well. Try others.
What’s the really simple question that makes the rest of the measurement moot? First see if you can detect any change in research quality before trying to measure it more comprehensively.

Sampling reality

In most cases, we’ll estimate the values in a population by measuring the values in a small sample from that population. And for reasons discussed in chapter 7, a very small sample can often offer large reductions in uncertainty.

There are a variety of tools we can use to build our estimates from small samples, and which one we should use often depends on how outliers are distributed in the population. In some cases, outliers are very close to the mean, and thus our estimate of the mean can converge quickly on the true mean as we look at new samples. In other cases, outliers can be several orders of magnitude away from the mean, and our estimate converges very slowly or not at all. Here are some examples:

Very quick convergence, only 1–2 samples needed: cholesterol level of your blood, purity of public water supply, weight of jelly beans.
Usually quickly convergence, 5–30 samples needed: Percentage of customers who like the new product, failure loads of bricks, age of your customers, how many movies people see in a year.
Potentially slow convergence: Software project cost overruns, factory downtime due to an accident.
Maybe non-convergent: Market value of corporations, individual levels of income, casualties of wars, size of volcanic eruptions.

Below, I survey just a few of the many sampling methods Hubbard covers in his book.

Mathless estimation

When working with a quickly converging phenomenon and a symmetric distribution (uniform, normal, camel-back, or bow-tie) for the population, you can use the t-statistic to develop a 90% CI even when working with very small samples. (See the book for instructions.)

Or, even easier, make use of the Rule of FIve: “There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.”

The Rule of Five has another advantage over the t-statistic: it works for any distribution of values in the population, including ones with slow convergence or no convergence at all! It can do this because it gives us a confidence interval for the median rather than the mean, and it’s the mean that is far more affected by outliers.

Hubbard calls this a “mathless” estimation technique because it doesn’t require us to take square roots or calculate standard deviation or anything like that. Moreover, this mathless technique extends beyond the Rule of Five: If we sample 8 items, there is a 99.2% chance that the median of the population falls within the largest and smallest values. If we take the 2nd largest and smallest values (out of 8 total values), we get something close to a 90% CI for the median. Hubbard generalizes the tool with this handy reference table:

And if the distribution is symmetrical, then the mathless table gives us a 90% CI for the mean as well as for the median.

Catch-recatch

How does a biologist measure the number of fish in a lake? SHe catches and tags a sample of fish – say, 1000 of them – and then releases them. After the fish have had time to spread amongst the rest of the population, she’ll catch another sample of fish. Suppose she caught 1000 fish again, and 50 of them were tagged. This would mean 5% of the fish were tagged, and thus that were about 20,000 fish in the entire lake. (See Hubbard’s book for the details on how to calculate the 90% CI.)

Spot sampling

The fish example was a special case of a common problem: population proportion sampling. Often, we want to know what proportion of a population has a particular trait. How many registered voters in California are Democrats? What percentage of your customers prefer a new product design over the old one?

Hubbard’s book discusses how to solve the general problem, but for now let’s just consider another special case: spot sampling.

In spot sampling, you take random snapshots of things rather than tracking them constantly. What proportion of their work hours do employees spend on Facebook? To answer this, you “randomly sample people through the day to see what they were doing at that moment. If you find that in 12 instances out of 100 random samples” employees were on Facebook, you can guess they spend about 12% of their time on Facebook (the 90% CI is 8% to 18%).

Clustered sampling

Hubbard writes:

“Clustered sampling” is defined as taking a random sample of groups, then conducting a census or a more concentrated sampling within the group. For example, if you want to see what share of households has satellite dishes… it might be cost effective to randomly choose several city blocks, then conduct a complete census of everything in a block. (Zigzagging across town to individually selected households would be time consuming.) In such cases, we can’t really consider the number of [households] in the groups… to be the number of random samples. Within a block, households may be very similar… [and therefore] it might be necessary to treat the effective number of random samples as the number of blocks…

Measure to the threshold

For many decisions, one decision is required if a value is above some threshold, and another decision is required if that value is below the threshold. For such decisions, you don’t care as much about a measurement that reduces uncertainty in general as you do about a measurement that tells you which decision to make based on the threshold. Hubbard gives an example:

Suppose you needed to measure the average amount of time spent by employees in meetings that could be conducted remotely… If a meeting is among staff members who communicate regularly and for a relatively routine topic, but someone has to travel to make the meeting, you probably can conduct it remotely. You start out with your calibrated estimate that the median employee spends between 3% to 15% traveling to meetings that could be conducted remotely. You determine that if this percentage is actually over 7%, you should make a significant investment in tele meetings. The [EVPI] calculation shows that it is worth no more than $15,000 to study this. According to our rule of thumb for measurement costs, we might try to spend about $1,500…

Let’s say you sampled 10 employees and… you find that only 1 spends less time in these activities than the 7% threshold. Given this information, what is the chance that the median time spent in such activities is actually below 7%, in which case the investment would not be justified? One “common sense” answer is 1/10, or 10%. Actually… the real chance is much smaller.

Hubbard shows how to derive the real chance in his book. The key point is that “the uncertainty about the threshold can fall much faster than the uncertainty about the quantity in general.”

Regression modeling

What if you want to figure out the cause of something that has many possible causes? One method is to perform a controlled experiment, and compare the outcomes of a test group to a control group. Hubbard discusses this in his book (and yes, he’s a Bayesian, and a skeptic of p-value hypothesis testing). For this summary, I’ll instead mention another method for isolating causes: regression modeling. Hubbard explains:

If we use regression modeling with historical data, we may not need to conduct a controlled experiment. Perhaps, for example, it is difficult to tie an IT project to an increase in sales, but we might have lots of data about how something else affects sales, such as faster time to market of new products. If we know that faster time to market is possible by automating certain tasks, that this IT investment eliminates certain tasks, and those tasks are on the critical path in the time-to-market, we can make the connection.

Hubbard’s book explains the basics of linear regressions, and of course gives the caveat that correlation does not imply causation. But, he writes, “you should conclude that one thing causes another only if you have some other good reason besides the correlation itself to suspect a cause-and-effect relationship.”

Bayes

Hubbard’s 10th chapter opens with a tutorial on Bayes’ Theorem. For an online tutorial, see here.

Hubbard then zooms out to a big-picture view of measurement, and recommends the “instinctive Bayesian approach”:

Start with your calibrated estimate.
Gather additional information (polling, reading other studies, etc.)
Update your calibrated estimate subjectively, without doing any additional math.

Hubbard says a few things in support of this approach. First, he points to some studies (e.g. El-Gamal & Grether (1995)) showing that people often reason in roughly-Bayesian ways. Next, he says that in his experience, people become better intuitive Bayesians when they (1) are made aware of the base rate fallacy, and when they (2) are better calibrated.

Hubbard says that once these conditions are met,

[then] humans seem to be mostly logical when incorporating new information into their estimates along with the old information. This fact is extremely useful because a human can consider qualitative information that does not fit in standard statistics. For example, if you were giving a forecast for how a new policy might change “public image” – measured in part by a reduction in customer complaints, increased revenue, and the like – a calibrated expert should be able to update current knowledge with “qualitative” information about how the policy worked for other companies, feedback from focus groups, and similar details. Even with sampling information, the calibrated estimator – who has a Bayesian instinct – can consider qualitative information on samples that most textbooks don’t cover.

He also offers a chart showing how a pure Bayesian estimator compares to other estimators:

Also, Bayes’ Theorem allows us to perform a “Bayesian inversion”:

Given a particular observation, it may seem more obvious to frame a measurement by asking the question “What can I conclude from this observation?” or, in probabilistic terms, “What is the probability X is true, given my observation?” But Bayes showed us that we could, instead, start with the question, “What is the probability of this observation if X were true?”

The second form of the question is useful because the answer is often more straightforward and it leads to the answer to the other question. It also forces us to think about the likelihood of different observations given a particular hypothesis and what that means for interpreting an observation.

[For example] if, hypothetically, we know that only 20% of the population will continue to shop at our store, then we can determine the chance [that] exactly 15 out of 20 would say so… [The details are explained in the book.] Then we can invert the problem with Bayes’ theorem to compute the chance that only 20% of the population will continue to shop there given [that] 15 out of 20 said so in a random sample. We would find that chance to be very nearly zero…

Other methods

Other chapters discuss other measurement methods, for example prediction markets, Rasch models, methods for measuring preferences and happiness, methods for improving the subjective judgments of experts, and many others.

Step 5: Make a decision and act on it

The last step will make more sense if we first “bring the pieces together.” Hubbard now organizes his consulting work with a firm into 3 phases, so let’s review what we’ve learned in the context of his 3 phases.

Phase 0: Project Preparation

Initial research: Interviews and secondary research to get familiar on the nature of the decision problem.
Expert identification: Usually 4–5 experts who provide estimates.

Phase 1: Decision Modeling

Decision problem definition: Experts define the problem they’re trying to analyze.
Decision model detail: Using an Excel spreadsheet, the AIE analyst elicits from the experts all the factors that matter for the decision being analyzed: costs and benefits, ROI, etc.
Initial calibrated estimates: First, the experts undergo calibration training. Then, they fill in the values (as 90% CIs or other probability distributions) for the variables in the decision model.

Phase 2: Optimal measurements

Value of information analysis: Using Excel macros, the AIE analyst runs a value of information analysis on every variable in the model.
Preliminary measurement method designs: Focusing on the few variables with highest information value, the AIE analyst chooses measurement methods that should reduce uncertainty.
Measurement methods: Decomposition, random sampling, Bayesian inversion, controlled experiments, and other methods are used (as appropriate) to reduce the uncertainty of the high-VoI variables.
Updated decision model: The AIE analyst updates the decision model based on the results of the measurements.
Final value of information analysis: The AIE analyst runs a VoI analysis on each variable again. As long as this analysis shows information value much greater than the cost of measurement for some variables, measurement and VoI analysis continues in multiple iterations. Usually, though, only one or two iterations are needed before the VoI analysis shows that no further measurements are justified.

Phase 3: Decision optimization and the final recommendation

Completed risk/return analysis: A final MC simulation shows the likelihood of possible outcomes.
Identified metrics procedures: Procedures are put in place to measure some variables (e.g. about project progress or external factors) continually.
Decision optimization: The final business decision recommendation is made (this is rarely a simple “yes/no” answer).

Final thoughts

Hubbard’s book includes two case studies in which Hubbard describes how he led two fairly different clients (the EPA and U.S. Marine Corps) through each phase of the AIE process. Then, he closes the book with the following summary:

If it’s really that important, it’s something you can define. If it’s something you think exists at all, it’s something you’ve already observed somehow.
If it’s something important and something uncertain, you have a cost of being wrong and a chance of being wrong.
You can quantify your current uncertainty with calibrated estimates.
You can compute the value of additional information by knowing the “threshold” of the measurement where it begins to make a difference compared to your existing uncertainty.
Once you know what it’s worth to measure something, you can put the measurement effort in context and decide on the effort it should take.
Knowing just a few methods for random sampling, controlled experiments, or even merely improving on the judgments of experts can lead to a significant reduction in uncertainty.

Algorithmic Progress in Six Domains

lukeprog — 2013-08-03T02:29:21.928Z

Today MIRI released a new technical report by visiting researcher Katja Grace called "Algorithmic Progress in Six Domains." The report summarizes data on algorithmic progress – that is, better performance per fixed amount of computing hardware – in six domains:

SAT solvers,
Chess and Go programs,
Physics simulations,
Factoring,
Mixed integer programming, and
Some forms of machine learning.

MIRI's purpose for collecting these data was to shed light on the question of intelligence explosion microeconomics, though we suspect the report will be of broad interest within the software industry and computer science academia.

One finding from the report was previously discussed by Robin Hanson here. (Robin saw an early draft on the intelligence explosion microeconomics mailing list.)

This is the preferred page for discussing the report in general.

Summary:

In recent boolean satisfiability (SAT) competitions, SAT solver performance has increased 5–15% per year, depending on the type of problem. However, these gains have been driven by widely varying improvements on particular problems. Retrospective surveys of SAT performance (on problems chosen after the fact) display significantly faster progress.

Chess programs have improved by around 50 Elo points per year over the last four decades. Estimates for the significance of hardware improvements are very noisy, but are consistent with hardware improvements being responsible for approximately half of progress. Progress has been smooth on the scale of years since the 1960s, except for the past five. Go programs have improved about one stone per year for the last three decades. Hardware doublings produce diminishing Elo gains, on a scale consistent with accounting for around half of progress.

Improvements in a variety of physics simulations (selected after the fact to exhibit performance increases due to software) appear to be roughly half due to hardware progress.

The largest number factored to date has grown by about 5.5 digits per year for the last two decades; computing power increased 10,000-fold over this period, and it is unclear how much of the increase is due to hardware progress.

Some mixed integer programming (MIP) algorithms, run on modern MIP instances with modern hardware, have roughly doubled in speed each year. MIP is an important optimization problem, but one which has been called to attention after the fact due to performance improvements. Other optimization problems have had more inconsistent (and harder to determine) improvements.

Various forms of machine learning have had steeply diminishing progress in percentage accuracy over recent decades. Some vision tasks have recently seen faster progress.

MIRI's 2013 Summer Matching Challenge

lukeprog — 2013-07-23T19:05:56.873Z

(MIRI maintains Less Wrong, with generous help from Trike Apps, and much of the core content is written by salaried MIRI staff members.)

Update 09-15-2013: The fundraising drive has been completed! My thanks to everyone who contributed.

The original post follows below...

Thanks to the generosity of several major donors,^† every donation to the Machine Intelligence Research Institute made from now until (the end of) August 15th, 2013 will be matched dollar-for-dollar, up to a total of $200,000!

Donate Now!

Now is your chance to double your impact while helping us raise up to $400,000 (with matching) to fund our research program.

This post is also a good place to ask your questions about our activities and plans — just post a comment!

If you have questions about what your dollars will do at MIRI, you can also schedule a quick call with MIRI Deputy Director Louie Helm: louie@intelligence.org (email), 510-717-1477 (phone), louiehelm (Skype).

Early this year we made a transition from movement-building to research, and we've hit the ground running with six major new research papers, six new strategic analyses on our blog, and much more. Give now to support our ongoing work on the future's most important problem.

Accomplishments in 2013 so far

Released six new research papers: (1) Definability of Truth in Probabilistic Logic, (2) Intelligence Explosion Microeconomics, (3) Tiling Agents for Self-Modifying AI, (4) Robust Cooperation in the Prisoner's Dilemma, (5) A Comparison of Decision Algorithms on Newcomblike Problems, and (6) Responses to Catastrophic AGI Risk: A Survey.
Held our 2nd and 3rd research workshops.
Published six new analyses to our blog: The Lean Nonprofit, AGI Impact Experts and Friendly AI Experts, Five Theses..., When Will AI Be Created?, Friendly AI Research as Effective Altruism, and What is Intelligence?
Published the Facing the Intelligence Explosion ebook.
Published several other substantial articles: Recommended Courses for MIRI Researchers, Decision Theory FAQ, A brief history of ethically concerned scientists, Bayesian Adjustment Does Not Defeat Existential Risk Charity, and others.
Published our first three expert interviews, with James Miller, Roman Yampolskiy, and Nick Beckstead.
Launched our new website at intelligence.org as part of changing our name to MIRI.
Relocated to new offices... 2 blocks from UC Berkeley, which is ranked 5th in the world in mathematics, and 1st in the world in mathematical logic.
And of course much more.

Future Plans You Can Help Support

We will host many more research workshops, including one in September in Berkeley, one in December (with John Baez attending) in Berkeley, and one in Oxford, UK (dates TBD).
Eliezer will continue to publish about open problems in Friendly AI. (Here is #1 and #2.)
We will continue to publish strategic analyses and expert interviews, mostly via our blog.
We will publish nicely-edited ebooks (Kindle, iBooks, and PDF) for more of our materials, to make them more accessible: The Sequences, 2006-2009 and The Hanson-Yudkowsky AI Foom Debate.
We will continue to set up the infrastructure (e.g. new offices, researcher endowments) required to host a productive Friendly AI research team, and (over several years) recruit enough top-level math talent to launch it.
We hope to hire an experienced development director (job ad not yet posted), so that the contributions of our current supporters can be multiplied even further by a professional fundraiser.

(Other projects are still being surveyed for likely cost and strategic impact.)

We appreciate your support for our high-impact work! Donate now, and seize a better than usual chance to move our work forward.

If you have questions about donating, please contact Louie Helm at (510) 717-1477 or louie@intelligence.org.

^† $200,000 of total matching funds has been provided by Jaan Tallinn, Loren Merritt, Rick Schwall, and Alexei Andreev.

Model Combination and Adjustment

lukeprog — 2013-07-17T20:31:08.687Z

The debate on the proper use of inside and outside views has raged for some time now. I suggest a way forward, building on a family of methods commonly used in statistics and machine learning to address this issue — an approach I'll call "model combination and adjustment."

Inside and outside views: a quick review

1. There are two ways you might predict outcomes for a phenomenon. If you make your predictions using a detailed visualization of how something works, you're using an inside view. If instead you ignore the details of how something works, and instead make your predictions by assuming that a phenomenon will behave roughly like other similar phenomena, you're using an outside view (also called reference class forecasting).

Inside view examples:

"When I break the project into steps and visualize how long each step will take, it looks like the project will take 6 weeks"
"When I combine what I know of physics and computation, it looks like the serial speed formulation of Moore's Law will break down around 2005, because we haven't been able to scale down energy-use-per-computation as quickly as we've scaled up computations per second, which means the serial speed formulation of Moore's Law will run into roadblocks from energy consumption and heat dissipation somewhere around 2005."

Outside view examples:

"I'm going to ignore the details of this project, and instead compare my project to similar projects. Other projects like this have taken 3 months, so that's probably about how long my project will take."
"The serial speed formulation of Moore's Law has held up for several decades, through several different physical architectures, so it'll probably continue to hold through the next shift in physical architectures."

See also chapter 23 in Kahneman (2011); Planning Fallacy; Reference class forecasting. Note that, after several decades of past success, the serial speed formulation of Moore's Law did in fact break down in 2004 for the reasons described (Fuller & Millett 2011).

2. An outside view works best when using a reference class with a similar causal structure to the thing you're trying to predict. An inside view works best when a phenomenon's causal structure is well-understood, and when (to your knowledge) there are very few phenomena with a similar causal structure that you can use to predict things about the phenomenon you're investigating. See: The Outside View's Domain.

When writing a textbook that's much like other textbooks, you're probably best off predicting the cost and duration of the project by looking at similar textbook-writing projects. When you're predicting the trajectory of the serial speed formulation of Moore's Law, or predicting which spaceship designs will successfully land humans on the moon for the first time, you're probably best off using an (intensely informed) inside view.

3. Some things aren't very predictable with either an outside view or an inside view. Sometimes, the thing you're trying to predict seems to have a significantly different causal structure than other things, and you don't understand its causal structure very well. What should we do in such cases? This remains a matter of debate.

Eliezer Yudkowsky recommends a weak inside view for such cases:

On problems that are drawn from a barrel of causally similar problems, where human optimism runs rampant and unforeseen troubles are common, the Outside View beats the Inside View... [But] on problems that are new things under the Sun, where there's a huge change of context and a structural change in underlying causal forces, the Outside View also fails - try to use it, and you'll just get into arguments about what is the proper domain of "similar historical cases" or what conclusions can be drawn therefrom. In this case, the best we can do is use the Weak Inside View — visualizing the causal process — to produce loose qualitative conclusions about only those issues where there seems to be lopsided support.

In contrast, Robin Hanson recommends an outside view for difficult cases:

It is easy, way too easy, to generate new mechanisms, accounts, theories, and abstractions. To see if such things are useful, we need to vet them, and that is easiest "nearby", where we know a lot. When we want to deal with or understand things "far", where we know little, we have little choice other than to rely on mechanisms, theories, and concepts that have worked well near. Far is just the wrong place to try new things.

There are a bazillion possible abstractions we could apply to the world. For each abstraction, the question is not whether one can divide up the world that way, but whether it "carves nature at its joints", giving useful insight not easily gained via other abstractions. We should be wary of inventing new abstractions just to make sense of things far; we should insist they first show their value nearby.

In Yudkowsky (2013), sec. 2.1, Yudkowsky offers a reply to these paragraphs, and continues to advocate for a weak inside view. He also adds:

the other major problem I have with the “outside view” is that everyone who uses it seems to come up with a different reference class and a different answer.

This is the problem of "reference class tennis": each participant in the debate claims their own reference class is most appropriate for predicting the phenomenon under discussion, and if disagreement remains, they might each say "I’m taking my reference class and going home."

Responding to the same point made elsewhere, Robin Hanson wrote:

[Earlier, I] warned against over-reliance on “unvetted” abstractions. I wasn’t at all trying to claim there is one true analogy and all others are false. Instead, I argue for preferring to rely on abstractions, including categories and similarity maps, that have been found useful by a substantial intellectual community working on related problems.

Multiple reference classes

Yudkowsky (2013) adds one more complaint about reference class forecasting in difficult forecasting circumstances:

A final problem I have with many cases of 'reference class forecasting' is that... [the] final answers [generated from this process] often seem more specific than I think our state of knowledge should allow. [For example,] I don’t think you should be able to tell me that the next major growth mode will have a doubling time of between a month and a year. The alleged outside viewer claims to know too much, once they stake their all on a single preferred reference class.

Both this comment and Hanson's last comment above point to the vulnerability of relying on any single reference class, at least for difficult forecasting problems. Beware brittle arguments, says Paul Christiano.

One obvious solution is to use multiple reference classes, and weight them by how relevant you think they are to the phenomenon you're trying to predict. Holden Karnofsky writes of investigating things from "many different angles." Jonah Sinick refers to "many weak arguments." Statisticians call this "model combination." Machine learning researchers call it "ensemble learning" or "classifier combination."

In other words, we can use many outside views.

Nate Silver does this when he predicts elections (see Silver 2012, ch. 2). Venture capitalists do this when they evaluate startups. The best political forecasters studied in Tetlock (2005), the "foxes," tended to do this.

In fact, most of us do this regularly.

How do you predict which restaurant's food you'll most enjoy, when visiting San Francisco for the first time? One outside view comes from the restaurant's Yelp reviews. Another outside view comes from your friend Jade's opinion. Another outside view comes from the fact that you usually enjoy Asian cuisines more than other cuisines. And so on. Then you combine these different models of the situation, weighting them by how robustly they each tend to predict your eating enjoyment, and you grab a taxi to Osha Thai.

(Technical note: I say "model combination" rather than "model averaging" on purpose.)

Model combination and adjustment

You can probably do even better than this, though — if you know some things about the phenomenon and you're very careful. Once you've combined a handful of models to arrive at a qualitative or quantitative judgment, you should still be able to "adjust" the judgment in some cases using an inside view.

For example, suppose I used the above process, and I plan to visit Osha Thai for dinner. Then, somebody gives me my first taste of the Synsepalum dulcificum fruit. I happen to know that this fruit contains a molecule called miraculin which binds to one's tastebuds and makes sour foods taste sweet, and that this effect lasts for about an hour (Koizumi et al. 2011). Despite the results of my earlier model combination, I predict I won't particularly enjoy Osha Thai at the moment. Instead, I decide to try some tabasco sauce, to see whether it now tastes like doughnut glaze.

In some cases, you might also need to adjust for your prior over, say, "expected enjoyment of restaurant food," if for some reason your original model combination procedure didn't capture your prior properly.

Against "the outside view"

There is a lot more to say about model combination and adjustment (e.g. this), but for now let me make a suggestion about language usage.

Sometimes, small changes to our language can help us think more accurately. For example, gender-neutral language can reduce male bias in our associations (Stahlberg et al. 2007). In this spirit, I recommend we retire the phrase "the outside view..", and instead use phrases like "some outside views..." and "an outside view..."

My reasons are:

Speaking of "the" outside view privileges a particular reference class, which could make us overconfident of that particular model's predictions, and leave model uncertainty unaccounted for.
Speaking of "the" outside view can act as a conversation-stopper, whereas speaking of multiple outside views encourages further discussion about how much weight each model should be given, and what each of them implies about the phenomenon under discussion.

Writing Style and the Typical Mind Fallacy

lukeprog — 2013-07-14T04:47:48.167Z

For a long time, Eliezer has been telling me I should write more like he does. I've mostly resisted, preferring instead to write like this:

Explain the lesson of the post immediately, and outline the ideas clearly with lots of headings, subheadings, lists, etc.
State the abstract points first, then give concrete examples.
Provide lots of links and references to related work so that readers have the opportunity to read more detail about what I'm trying to say (in case it wasn't clear in a single sentence or paragraph), or read the same thing from a different angle (in case the metaphors and language I used weren't clear to that reader).

Eliezer talks as though his style is simply better writing, while I've complained that I often can't even tell what his posts are saying.

I'm a bit embarrassed to admit that it wasn't until sometime last month that I realized that, obviously, different people prefer each style, and Eliezer and I were both falling prey to the typical mind fallacy.

At the recent Effective Altruism Summit I tried to figure out which personal features predicted writing style preference.

One hypothesis was that people who read lots of fiction (like Eliezer) will tend to prefer Eliezer's story-like style, while those who read almost exclusively non-fiction (like me) will tend to prefer my "just gimme the facts" style. This hypothesis didn't hold up well on my non-scientific survey of ~10 LW-reading effective altruists.

Another hypothesis was that most people would prefer Eliezer's more exciting posts, while people trained in the sciences or analytic philosophy (which insist on clear organization, definitions, references to related work, etc.) would prefer my posts. This hypothesis fared a bit better, but not by much.

Another hypothesis was that people who had acquired an epiphany addiction would prefer Eliezer's style, whereas those who just want to learn everything efficiently would prefer my style. But I didn't test this.

Another hypothesis that occurs to me is that people with short attention spans could prefer my more skimmable style. But I haven't tested this.

Perhaps the community would like to propose some hypotheses, and test them with LW polling?

Four Focus Areas of Effective Altruism

lukeprog — 2013-07-09T00:59:40.963Z

It was a pleasure to see all major strands of the effective altruism movement gathered in one place at last week's Effective Altruism Summit.

Representatives from GiveWell, The Life You Can Save, 80,000 Hours, Giving What We Can, Effective Animal Altruism, Leverage Research, the Center for Applied Rationality, and the Machine Intelligence Research Institute either attended or gave presentations. My thanks to Leverage Research for organizing and hosting the event!

What do all these groups have in common? As Peter Singer said in his TED talk, effective altruism "combines both the heart and the head." The heart motivates us to be empathic and altruistic toward others, while the head can "make sure that what [we] do is effective and well-directed," so that altruists can do not just some good but as much good as possible.

Effective altruists (EAs) tend to:

Be globally altruistic: EAs care about people equally, regardless of location. Typically, the most cost-effective altruistic cause won't happen to be in one's home country.
Value consequences: EAs tend to value causes according to their consequences, whether those consequences are happiness, health, justice, fairness and/or other values.
Try to do as much good as possible: EAs don't just want to do some good; they want to do (roughly) as much good as possible. As such, they hope to devote their altruistic resources (time, money, energy, attention) to unusually cost-effective causes. (This doesn't necessarily mean that EAs think "explicit" cost effectiveness calculations are the best method for figuring out which causes are likely to do the most good.)
Think scientifically and quantitatively: EAs tend to be analytic, scientific, and quantitative when trying to figure out which causes actually do the most good.
Be willing to make significant life changes to be more effectively altruistic: As a result of their efforts to be more effective in their altruism, EAs often (1) change which charities they support financially, (2) change careers, (3) spend significant chunks of time investigating which causes are most cost-effective according to their values, or (4) make other significant life changes.

Despite these similarities, EAs are a diverse bunch, and they focus their efforts on a variety of causes.

Below are four popular focus areas of effective altruism, ordered roughly by how large and visible they appear to be at the moment. Many EAs work on several of these focus areas at once, due to uncertainty about both facts and values.

Though labels and categories have their dangers, they can also enable chunking, which has benefits for memory, learning, and communication. There are many other ways we might categorize the efforts of today's EAs; this is only one categorization.

Focus Area 1: Poverty Reduction

Here, "poverty reduction" is meant in a broad sense that includes (e.g.) economic benefit, better health, and better education.

Major organizations in this focus area include:

GiveWell is home to the most rigorous research on charitable causes, especially poverty reduction and global health. Their current charity recommendations are the Against Malaria Foundation, GiveDirectly, and the Schistosomiasis Control Initiative. (Note that GiveWell also does quite a bit of "meta effective altruism"; see below.)
Good Ventures works closely with GiveWell.
The Life You Can Save (TLYCS), named after Peter Singer's book on effective altruism, encourages people to pledge a fraction of their income to effective charities. TLYCS currently recommends GiveWell's recommended charities and several others.
Giving What We Can (GWWC) does some charity evaluation and also encourages people to pledge 10% of their income effective charities. GWWC currently recommends two of GiveWell's recommended charities and two others.
AidGrade evaluates the cost effectiveness of poverty reduction causes, with less of a focus on individual organizations.

In addition, some well-endowed foundations seem to have "one foot" in effective poverty reduction. For example, the Bill & Melinda Gates Foundation has funded many of the most cost-effective causes in the developing world (e.g. vaccinations), although it also funds less cost-effective-seeming interventions in the developed world.

In the future, poverty reduction EAs might also focus on economic, political, or research-infrastructure changes that might achieve poverty reduction, global health, and educational improvements more indirectly, as when Chinese economic reforms lifted hundreds of millions out of poverty. Though it is generally easier to evaluate the cost-effectiveness of direct efforts than that of indirect efforts, some groups (e.g. GiveWell Labs and The Vannevar Group) are beginning to evaluate the likely cost-effectiveness of these causes.

Focus Area 2: Meta Effective Altruism

Meta effective altruists focus less on specific causes and more on "meta" activities such as raising awareness of the importance of evidence-based altruism, helping EAs reach their potential, and doing research to help EAs decide which focus areas they should contribute to.

Organizations in this focus area include:

80,000 Hours highlights the importance of helping the world effectively through one's career. They also offer personal counseling to help EAs choose a career and a set of causes to support.
Explicitly, the Center for Applied Rationality (CFAR) just trains people in rationality skills. But de facto they are especially focused on the application of rational thought to the practice of altruism, and are deeply embedded in the effective altruism community.
Leverage Research focuses on growing and empowering the EA movement, e.g. by running Effective Altruism Summit, by organizing the THINK student group network, and by searching for "mind hacks" (like the memory palace) that can make EAs more effective.

Other people and organizations contribute to meta effective altruism, too. Paul Christiano examines effective altruism from a high level at Rational Altruist. GiveWell and others often write about the ethics and epistemology of effective altruism in addition to focusing on their chosen causes. And, of course, most EA organizations spend some resources growing the EA movement.

Focus Area 3: The Long-Term Future

Many EAs value future people roughly as much as currently-living people, and think that nearly all potential value is found in the well-being of the astronomical numbers of people who could populate the long-term future (Bostrom 2003; Beckstead 2013). Future-focused EAs aim to somewhat-directly capture these "astronomical benefits" of the long-term future, e.g. via explicit efforts to reduce existential risk.

Organizations in this focus area include:

The Future of Humanity Institute at Oxford University is the primary hub of research on existential risk mitigation within the effective altruism movement. (CSER may join it soon, if it gets funding.)
The Machine Intelligence Research Institute focuses on doing the research needed for humanity to one day build Friendly AI that could make astronomical numbers of future people enormously better off. It also runs the Less Wrong group blog and forum, where much of today's EA analysis and discussion occurs.

Other groups study particular existential risks (among other things), though perhaps not explicitly from the view of effective altruism. For example, NASA has spent time identifying nearby asteroids that could be an existential threat, and many organizations (e.g. GCRI) study worst-case scenarios for climate change or nuclear warfare that might result in human extinction but are more likely to result in "merely catastrophic" damage.

Some EAs (e.g. Holden Karnofsky, Paul Christiano) have argued that even if nearly all value lies in the long-term future, focusing on nearer-term goals (e.g. effective poverty reduction or meta effective altruism) may be more likely to realize that value than more direct efforts.

Focus Area 4: Animal Suffering

Effective animal altruists are focused on reducing animal suffering in cost-effective ways. After all, animals vastly outnumber humans, and growing numbers of scientists believe that many animals consciously experience pleasure and suffering.

The only organization of this type so far (that I know of) is Effective Animal Activism, which currently recommends supporting The Humane League and Vegan Outreach.

Edit: There is now also Animal Ethics, Inc.

Major inspirations for those in this focus area include Peter Singer, David Pearce, and Brian Tomasik.

Other focus areas

I could perhaps have listed "effective environmental altruism" as focus area 5. The environmental movement in general is large and well-known, but I'm not aware of many effective altruists who take environmentalism to be the most important cause for them to work on, after closely investigating the above focus areas. In contrast, the groups and people named above tend to have influenced each other, and have considered all these focus areas explicitly. For this reason, I've left "effective environmental altruism" off the list, though perhaps a popular focus on effective environmental altruism could arise in the future.

Other focus areas could later come to prominence, too.

Working together

I was pleased to see the EAs from different strands of the EA movement cooperating and learning from each other at the Effective Altruism Summit. Cooperation is crucial for growing the EA movement, so I hope that even if it’s not always easy, EAs will "go out of their way" to cooperate and work together, no matter which focus areas they’re sympathetic to.

Responses to Catastrophic AGI Risk: A Survey

lukeprog — 2013-07-08T14:33:50.800Z

A great many Less Wrongers gave feedback on earlier drafts of "Responses to Catastrophic AGI Risk: A Survey," which has now been released. This is the preferred discussion page for the paper.

The report, co-authored by past MIRI researcher Kaj Sotala and University of Louisville’s Roman Yampolskiy, is a summary of the extant literature (250+ references) on AGI risk, and can serve either as a guide for researchers or as an introduction for the uninitiated.

Here is the abstract:

Many researchers have argued that humanity will create artificial general intelligence (AGI) within the next twenty to one hundred years. It has been suggested that AGI may pose a catastrophic risk to humanity. After summarizing the arguments for why AGI may pose such a risk, we survey the field’s proposed responses to AGI risk. We consider societal proposals, proposals for external constraints on AGI behaviors, and proposals for creating AGIs that are safe due to their internal design.

Start Under the Streetlight, then Push into the Shadows

lukeprog — 2013-06-24T00:49:22.961Z

The streetlight effect

You've heard the joke before:

Late at night, a police officer finds a drunk man crawling around on his hands and knees under a streetlight. The drunk man tells the officer he’s looking for his wallet. When the officer asks if he’s sure this is where he dropped the wallet, the man replies that he thinks he more likely dropped it across the street. Then why are you looking over here? the befuddled officer asks. Because the light’s better here, explains the drunk man.

The joke illustrates the streetlight effect: we "tend to look for answers where the looking is good, rather than where the answers are likely to be hiding."

Freedman (2010) documents at length some harms caused by the streetlight effect. For example:

A bolt of excitement ran through the field of cardiology in the early 1980s when anti-arrhythmia drugs burst onto the scene. Researchers knew that heart-attack victims with steady heartbeats had the best odds of survival, so a medication that could tamp down irregularities seemed like a no-brainer. The drugs became the standard of care for heart-attack patients and were soon smoothing out heartbeats in intensive care wards across the United States.

But in the early 1990s, cardiologists realized that the drugs were also doing something else: killing about 56,000 heart-attack patients a year. Yes, hearts were beating more regularly on the drugs than off, but their owners were, on average, one-third as likely to pull through. Cardiologists had been so focused on immediately measurable arrhythmias that they had overlooked the longer-term but far more important variable of death.

Start under the streetlight

Of course, there are good reasons to search under the streetlight:

It is often extremely difficult or even impossible to cleanly measure what is really important, so scientists instead cleanly measure what they can, hoping it turns out to be relevant.

In retrospect, we might wish cardiologists had done a decade-long longitudinal study measuring the long-term effects of the new anti-arrhythmia drugs of the 1980s. But it's easy to understand why they didn't. Decades-long longitudinal studies are expensive, and resources are limited. It was more efficient to rely on an easily-measurable proxy variable like arrhythmias.

We must remember, however, that the analogy to the streetlight joke isn't exact. Searching under the streetlight gives the drunkard virtually no information about where his wallet might be. But in science and other disciplines, searching under the streetlight can reveal helpful clues about the puzzle you're investigating. Given limited resources, it's often best to start searching under the streetlight and then, initial clues in hand, push into the shadows.¹

The problem with streetlight science isn't that it relies on easily-measurable proxy variables. If you want to figure out how some psychological trait works, start with a small study and use free undergraduates at your home university — that's a good way to test hypotheses cheaply. The problem comes in when researchers don't appropriately flag the fact their subjects were WEIRD and that a larger study needs to be done on a more representative population before we start drawing conclusions. (Another problem is that despite some researcher's cautions against overgeneralizing from a study of WEIRD subjects, the media will write splashy, universalizing headlines anyway.)

But money and time aren't the only resources that might be limited. Another is human reasoning ability. Human brains were built for hunting and gathering in the savannah, not for unlocking the mysteries of fundamental physics or intelligence or consciousness. So even if time and money aren't limiting factors, it's often best to break a complex problem into pieces and think through the simplest pieces, or the pieces for which our data are most robust, before trying to answer the questions you most want to solve.

As Pólya advises in his hugely popular How to Solve It, "If you cannot solve the proposed problem, try to solve first some related [but easier] problem." In physics, this related but easier problem is often called a toy model. In other fields, it is sometimes called a toy problem. Animal models are often used as toy models in biology and medicine.

Or, as Scott Aaronson put it:

...I don’t spend my life thinking about P versus NP [because] there are vastly easier prerequisite questions that we already don’t know how to answer. In a field like [theoretical computer science], you very quickly get used to being able to state a problem with perfect clarity, knowing exactly what would constitute a solution, and still not having any clue how to solve it... And at least in my experience, being pounded with this situation again and again slowly reorients your worldview... Faced with a [very difficult question,] you learn to respond: “What’s another question that’s easier to answer, and that probably has to be answered anyway before we have any chance on the original one?”

I'll close with two examples: GiveWell on effective altruism and MIRI on stability under self-modification.

GiveWell on effective altruism

GiveWell's mission is "to find outstanding giving opportunities and publish the full details of our analysis to help donors decide where to give."

But finding and verifying outstanding giving opportunities is hard. Consider the case of one straightforward-seeming intervention: deworming.

Nearly 2 billion people (mostly in poor countries) are infected by parasitic worms that hinder their cognitive development and overall health. This is also producing barriers to economic development where parasitic worms are common. Luckily, deworming pills are cheap, and early studies indicated that they improved educational outcomes. The DCP2, produced by over 300 contributors and in collaboration with the World Health Organization, estimated that a particular deworming treatment was one of the most cost-effective treatments in global health, at just $3.41 per DALY.

Unfortunately, things are not so simple. A careful review of the evidence in 2008 by The Cochrane Collaboration concluded that, due to weaknesses in some studies' designs and other factors, "No effect [of deworming drugs] on cognition or school performance has been demonstrated." And in 2011, GiveWell found that a spreadsheet used to produce the DCP2's estimates contained 5 separate errors that, when corrected, increased the cost estimate for deworming by roughly a factor of 100. In 2012, another Cochrane review was even more damning for the effectiveness of deworming, concluding that "Routine deworming drugs given to school children... has not shown beneﬁt on weight in most studies... For haemoglobin and cognition, community deworming seems to have little or no effect, and the evidence in relation to school attendance, and school performance is generally poor, with no obvious or consistent effect."

On the other hand, Innovations for Poverty Action critiqued the 2012 Cochrane review, and GiveWell said the review did not fully undermine the case for its #3 recommended charity, which focuses on deworming.

What are we to make of this? Thousands of hours of data collection and synthesis went into producing the initial case for deworming as a cost-effective intervention, and thousands of additional hours were required to discover flaws in those initial analyses. In the end, GiveWell recommends one deworming charity, the Schistosomiasis Control Initiative, but their page on SCI is littered with qualifications and concerns and "We don't know"s.

GiveWell had to wrestle with these complications despite the fact that it chose to search under the streetlight. Global health interventions are among the easiest interventions to analyze, and have often been subjected to multiple randomized controlled trials and dozens of experimental studies. Such high-quality evidence usually isn't available when trying to estimate the cost-effectiveness of, say, certain forms of political activism.

GiveWell co-founder Holden Karnofsky suspects that the best giving opportunities are not in the domain of global health, but GiveWell began their search in global health — under the spotlight — (in part) because the evidence was clearer there.²

It's difficult to do counterfactual history, but I suspect GiveWell made the right choice. While investigating global health, GiveWell has learned many important lessons about effective altruism — lessons it would have been more difficult to learn with the same clarity if they had begun with investigations of even-more-challenging domains like meta-research and pollitical activism. But now that they've learned those lessons, they're beginning to push into the shadows where the evidence is less clear, via GiveWell Labs.

MIRI on stability under self-modification

MIRI's mission is "to ensure that the creation of smarter-than-human intelligence has a positive impact."

Many different interventions have been proposed as methods for increasing the odds that smarter-than-human intelligence has a positive impact, but for several reasons MIRI decided to focus its efforts on "Friendly AI research" during 2013.

The FAI research program decomposes into a wide variety of technical research questions. One of those questions is the question of stability under self-modification:

How can we ensure that an AI will serve its intended purpose even after repeated self-modification?

This is a challenging and ill-defined question. How might we make progress on such a puzzle?

For puzzles such as this one, Scott Aaronson recommends a strategy he calls "bait and switch":

[Philosophical] progress has almost always involved a [kind of] “bait-and-switch.” In other words: one replaces an unanswerable philosophical riddle Q by a “merely” scientific or mathematical question Q′, which captures part of what people have wanted to know when they’ve asked Q. Then, with luck, one solves Q′... this process of “breaking oﬀ” answerable parts of unanswerable riddles, then trying to answer those parts, is the closest thing to philosophical progress that there is.

Successful examples of this breaking-oﬀ process fill intellectual history. The use of calculus to treat infinite series, the link between mental activity and nerve impulses, natural selection, set theory and first-order logic, special relativity, Gödel’s theorem, game theory, information theory, computability and complexity theory, the Bell inequality, the theory of common knowledge, Bayesian causal networks — each of these advances addressed questions that could rightly have been called “philosophical” before the advance was made.

The recent MIRI report on Tiling Agents performs one such "bait and switch." It replaces the philosophical puzzle of "How can we ensure that an AI will serve its intended purpose even after repeated self-modification?" (Q) with a better-specified formal puzzle on which it is possible to make measurable progress: "How can an agent perform perfectly tiling self-modifications despite Löb's Theorem?" (Q')

This allows us to state at least three crisp technical problems: Löb and coherent quantified belief (sec. 3 of 'Tiling Agents'), nonmonotonicity of probabilistic reasoning (secs. 5.2 & 7), and maximizing/satisficing not being satisfactory for bounded agents (sec. 8). It also allows us to identify progress: formal results that mankind had not previously uncovered (sec. 4).

Of course, even if Q' is eventually solved, we'll need to check whether there are other pieces of Q we need to solve. Or perhaps Q will have been dissolved by our efforts to solve Q', similar to how the question "What force distinguishes living matter from non-living matter?" was dissolved by 20th century biology.

Notes

¹ Karnofsky (2011) suggests that it may often be best to start under the streetlight and stay there, at least in the context of effective altruism. Karnofsky asks, "What does it look like when we build knowledge only where we’re best at building knowledge, rather than building knowledge on the 'most important problems?'" His reply is: "Researching topics we’re good at researching can have a lot of benefits, some unexpected, some pertaining to problems we never expected such research to address. Researching topics we’re bad at researching doesn’t seem like a good idea no matter how important the topics are. Of course I’m in favor of thinking about how to develop new research methods to make research good at what it was formerly bad at, but I’m against applying current problematic research methods to current projects just because they’re the best methods available." Here's one example: "what has done more for political engagement in the U.S.: studying how to improve political engagement, or studying the technology that led to the development of the Internet, the World Wide Web, and ultimately to sites like Change.org...?" I am sympathetic with Karnofsky's view in many cases, but I will give two points of reply with respect to my post above. First, in the above post I wanted to focus on the question of how to tackle difficult questions, not the question of whether difficult questions should be tackled in the first place. And conditional on one's choice to tackle a difficult question, I recommend one start under the streetlight and push into the shadows. Second, my guess is that I'm talking about a broader notion of the streetlight effect than Karnofsky is. For example, I doubt Karnofsky would object to the process of tackling a problem in theoretical computer science or math by trying to solve easier, related problems first.

² In GiveWell's January 24th, 2013 board meeting (starting at 6:35 in the MP3 recording), GiveWell co-founder Holden Karnofsky said that interventions outside global health are "where we would bet today that we'll find... the best giving opportunities... that best fulfill GiveWell's mission as originally [outlined] in the mission statement." This doesn't appear to be a recently acquired view of things, either. Starting at 22:47 in the same recording, Karnofsky says "There were reasons that we focused on [robustly evidence-backed] interventions for GiveWell initially, but... the [vision] I've been pointing to [of finding giving opportunities outside global health, where less evidence is available]... has [to me] been the vision all along." In personal communication with me, Karnofsky wrote that "We sought to start 'under the streetlight,' as you say, and so focused on finding opportunities to fund things with strong documented evidence of being 'proven, cost-effective and scalable.' Initially we looked at both U.S. and global interventions, and within developing-world interventions we looked at health but also economic empowerment. We ended up focusing on global health because it performed best by these criteria."

Elites and AI: Stated Opinions

lukeprog — 2013-06-15T19:52:36.207Z

Previously, I asked "Will the world's elites navigate the creation of AI just fine?" My current answer is "probably not," but I think it's a question worth additional investigation.

As a preliminary step, and with the help of MIRI interns Jeremy Miller and Oriane Gaillard, I've collected a few stated opinions on the issue. This survey of stated opinions is not representative of any particular group, and is not meant to provide strong evidence about what is true on the matter. It's merely a collection of quotes we happened to find on the subject. Hopefully others can point us to other stated opinions — or state their own opinions.

MIRI researcher Eliezer Yudkowsky is famously pessimistic on this issue. For example, in a 2009 comment, he replied to the question "What kind of competitive or political system would make fragmented squabbling AIs safer than an attempt to get the monolithic approach right?" by saying "the answer is, 'None.' It's like asking how you should move your legs to walk faster than a jet plane" — again, implying extreme skepticism that political elites will manage AI properly.¹

Cryptographer Wei Dai is also quite pessimistic:

...even in a relatively optimistic scenario, one with steady progress in AI capability along with apparent progress in AI control/safety (and nobody deliberately builds a UFAI for the sake of "maximizing complexity of the universe" or what have you), it's probably only a matter of time until some AI crosses a threshold of intelligence and manages to "throw off its shackles". This may be accompanied by a last-minute scramble by mainstream elites to slow down AI progress and research methods of scalable AI control, which (if it does happen) will likely be too late to make a difference.

Stanford philosopher Ken Taylor has also expressed pessimism, in an episode of Philosophy Talk called "Turbo-charging the mind":

Think about nuclear technology. It evolved in a time of war... The probability that nuclear technology was going to arise at a time when we use it well rather than [for] destruction was low... Same thing with... superhuman artificial intelligence. It's going to emerge... in a context in which we make a mess out of everything. So the probability that we make a mess out of this is really high.

Here, Taylor seems to express the view that humans are not yet morally and rationally advanced enough to be trusted with powerful technologies. This general view has been expressed before by many others, including Albert Einstein, who wrote that "Our entire much-praised technological progress... could be compared to an axe in the hand of a pathological criminal."

In response to Taylor's comment, MIRI researcher Anna Salamon (now Executive Director of CFAR) expressed a more optimistic view:

I... disagree. A lot of my colleagues would [agree with you] that 40% chance of human survival is absurdly optimistic... But, probably we're not close to AI. Probably by the time AI hits we will have had more thinking going into it... [Also,] if the Germans had successfully gotten the bomb and taken over the world, there would have been somebody who profited. If AI runs away and kills everyone, there's nobody who profits. There's a lot of incentive to try and solve the problem together...

Economist James Miller is another voice of pessimism. In Singularity Rising, chapter 5, he worries about game-theoretic mechanisms incentivizing speed of development over safety of development:

Successfully creating [superhuman AI] would give a country control of everything, making [superhuman AI] far more militarily useful than mere atomic weapons. The first nation to create an obedient [superhuman AI] would also instantly acquire the capacity to terminate its rivals’ AI development projects. Knowing the stakes, rival nations might go full throttle to win [a race to superhuman AI], even if they understood that haste could cause them to create a world-destroying [superhuman AI]. These rivals might realize the danger and desperately wish to come to an agreement to reduce the peril, but they might find that the logic of the widely used game theory paradox of the Prisoners’ Dilemma thwarts all cooperation efforts... Imagine that both the US and Chinese militaries want to create [superhuman AI]. To keep things simple, let’s assume that each military has the binary choice to proceed either slowly or quickly. Going slowly increases the time it will take to build [superhuman AI] but reduces the likelihood that it will become unfriendly and destroy humanity. The United States and China might come to an agreement and decide that they will both go slowly... [But] if the United States knows that China will go slowly, it might wish to proceed quickly and accept the additional risk of destroying the world in return for having a much higher chance of being the first country to create [superhuman AI]. (During the Cold War, the United States and the Soviet Union risked destroying the world for less.) The United States might also think that if the Chinese proceed quickly, then they should go quickly, too, rather than let the Chinese be the likely winners of the... race.

In chapter 6, Miller expresses similar worries about corporate incentives and AI:

Paradoxically and tragically, the fact that [superhuman AI] would destroy mankind increases the chance of the private sector developing it. To see why, pretend that you’re at the racetrack deciding whether to bet on the horse Recursive Darkness. The horse offers a good payoff in the event of victory, but her odds of winning seem too small to justify a bet—until, that is, you read the fine print on the racing form: "If Recursive Darkness loses, the world ends." Now you bet everything you have on her because you realize that the bet will either pay off or become irrelevant.

Miller expanded on some of these points in his chapter in Singularity Hypotheses.

In a short reply to Miller, GMU economist Robin Hanson wrote that

[Miller's analysis is] only as useful as the assumptions on which it is based. Miller's chosen assumptions seem to me quite extreme, and quite unlikely.

Unfortunately, Hanson does not explain his reasons for rejecting Miller's analysis.

Sun Microsystems co-founder Bill Joy is famous for the techno-pessimism of his Wired essay "Why the Future Doesn't Need Us," but that article's predictions about elites' likely handling of AI are actually somewhat mixed:

we all wish our course could be determined by our collective values, ethics, and morals. If we had gained more collective wisdom over the past few thousand years, then a dialogue to this end would be more practical, and the incredible powers we are about to unleash would not be nearly so troubling.

One would think we might be driven to such a dialogue by our instinct for self-preservation. Individuals clearly have this desire, yet as a species our behavior seems to be not in our favor. In dealing with the nuclear threat, we often spoke dishonestly to ourselves and to each other, thereby greatly increasing the risks. Whether this was politically motivated, or because we chose not to think ahead, or because when faced with such grave threats we acted irrationally out of fear, I do not know, but it does not bode well.

The new Pandora's boxes of genetics, nanotechnology, and robotics are almost open, yet we seem hardly to have noticed... Churchill remarked, in a famous left-handed compliment, that the American people and their leaders 'invariably do the right thing, after they have examined every other alternative.' In this case, however, we must act more presciently, as to do the right thing only at last may be to lose the chance to do it at all...

...And yet I believe we do have a strong and solid basis for hope. Our attempts to deal with weapons of mass destruction in the last century provide a shining example of relinquishment for us to consider: the unilateral US abandonment, without preconditions, of the development of biological weapons. This relinquishment stemmed from the realization that while it would take an enormous effort to create these terrible weapons, they could from then on easily be duplicated and fall into the hands of rogue nations or terrorist groups.

Former GiveWell researcher Jonah Sinick has expressed optimism on the issue:

I personally am optimistic about the world's elites navigating AI risk as well as possible subject to inherent human limitations that I would expect everybody to have, and the inherent risk. Some points:

I've been surprised by people's ability to avert bad outcomes. Only two nuclear weapons have been used since nuclear weapons were developed, despite the fact that there are 10,000+ nuclear weapons around the world. Political leaders are assassinated very infrequently relative to how often one might expect a priori.

AI risk is a Global Catastrophic Risk in addition to being an x-risk. Therefore, even people who don't care about the far future will be motivated to prevent it.

The people with the most power tend to be the most rational people, and the effect size can be expected to increase over time... The most rational people are the people who are most likely to be aware of and to work to avert AI risk...

Availability of information is increasing over time. At the time of the Dartmouth conference, information about the potential dangers of AI was not very salient, now it's more salient, and in the future it will be still more salient...

In the Manhattan project, the "will bombs ignite the atmosphere?" question was analyzed and dismissed without much (to our knowledge) double-checking. The amount of risk checking per hour of human capital available can be expected to increase over time. In general, people enjoy tackling important problems, and risk checking is more important than most of the things that people would otherwise be doing.

Paul Christiano is another voice of optimism about elites' handling of AI. Here are some snippets from his "mainline" scenario for AI development:

It becomes fairly clear some time in advance, perhaps years, that broadly human-competitive AGI will be available soon. As this becomes obvious, competent researchers shift into more directly relevant work, and governments and researchers become more concerned with social impacts and safety issues...

Call the point where the share of human workers is negligible point Y. After Y humans are very unlikely to maintain control over global economic dynamics---the effective population is overwhelmingly dominated by machine intelligences... This picture becomes clear to serious onlookers well in advance of the development of human-level AGI... [hence] there is much intellectual activity aimed at understanding these dynamics and strategies for handling them, carried out both in public and within governments.

Why should we expect the control problem to be solved? ...at each point when we face a control problem more difficult than any we have faced so far and with higher consequences for failure, we expect to have faced slightly easier problems with only slightly lower consequences for failure in the past.

As long as solutions to the control problem are not quite satisfactory, the incentives to resolve control problems are comparable to the incentives to increase the capabilities of systems. If solutions are particularly unsatisfactory, then incentives to resolve control problems are very strong. So natural economic incentives build a control system (in the traditional sense from robotics) which keeps solutions to the control problem from being too unsatisfactory.

Christiano is no Polyanna, however. In the same document, he outlines "what could go wrong," and what we might do about it.

Notes

¹ I originally included another quote from Eliezer, but then I noticed that other readers on Less Wrong had elsewhere interpreted that same quote differently than I had, so I removed it from this post.

Will the world's elites navigate the creation of AI just fine?

lukeprog — 2013-05-31T18:49:10.861Z

One open question in AI risk strategy is: Can we trust the world's elite decision-makers (hereafter "elites") to navigate the creation of human-level AI (and beyond) just fine, without the kinds of special efforts that e.g. Bostrom and Yudkowsky think are needed?

Some reasons for concern include:

Otherwise smart people say unreasonable things about AI safety.
Many people who believed AI was around the corner didn't take safety very seriously.
Elites have failed to navigate many important issues wisely (2008 financial crisis, climate change, Iraq War, etc.), for a variety of reasons.
AI may arrive rather suddenly, leaving little time for preparation.

But if you were trying to argue for hope, you might argue along these lines (presented for the sake of argument; I don't actually endorse this argument):

If AI is preceded by visible signals, elites are likely to take safety measures. Effective measures were taken to address asteroid risk. Large resources are devoted to mitigating climate change risks. Personal and tribal selfishness align with AI risk-reduction in a way they may not align on climate change. Availability of information is increasing over time.
AI is likely to be preceded by visible signals. Conceptual insights often take years of incremental tweaking. In vision, speech, games, compression, robotics, and other fields, performance curves are mostly smooth. "Human-level performance at X" benchmarks influence perceptions and should be more exhaustive and come more rapidly as AI approaches. Recursive self-improvement capabilities could be charted, and are likely to be AI-complete. If AI succeeds, it will likely succeed for reasons comprehensible by the AI researchers of the time.
Therefore, safety measures will likely be taken.
If safety measures are taken, then elites will navigate the creation of AI just fine. Corporate and government leaders can use simple heuristics (e.g. Nobel prizes) to access the upper end of expert opinion. AI designs with easily tailored tendency to act may be the easiest to build. The use of early AIs to solve AI safety problems creates an attractor for "safe, powerful AI." Arms races not insurmountable.

The basic structure of this 'argument for hope' is due to Carl Shulman, though he doesn't necessarily endorse the details. (Also, it's just a rough argument, and as stated is not deductively valid.)

Personally, I am not very comforted by this argument because:

Elites often fail to take effective action despite plenty of warning.
I think there's a >10% chance AI will not be preceded by visible signals.
I think the elites' safety measures will likely be insufficient.

Obviously, there's a lot more for me to spell out here, and some of it may be unclear. The reason I'm posting these thoughts in such a rough state is so that MIRI can get some help on our research into this question.

In particular, I'd like to know:

Which historical events are analogous to AI risk in some important ways? Possibilities include: nuclear weapons, climate change, recombinant DNA, nanotechnology, chloroflourocarbons, asteroids, cyberterrorism, Spanish flu, the 2008 financial crisis, and large wars.
What are some good resources (e.g. books) for investigating the relevance of these analogies to AI risk (for the purposes of illuminating elites' likely response to AI risk)?
What are some good studies on elites' decision-making abilities in general?
Has the increasing availability of information in the past century noticeably improved elite decision-making?

Help us name the Sequences ebook

lukeprog — 2013-04-15T19:59:13.969Z

Quantum Computing Since Democritus got me thinking that we may want a more riveting title for The Sequences, 2006-2009 ebook we're preparing for release (like the FtIE ebook). Maybe it could be something like [Really Catchy Title]: The Less Wrong Sequences, 2006-2009.

The reason for "2006–2009" is that Highly Advanced Epistemology 101 for Beginners will be its own ebook, and future Yudkowskian LW sequences (if there are any) won't be included either.

Example options:

The Craft of Rationality: The Less Wrong Sequences, 2006–2009
The Art of Rationality: The Less Wrong Sequences, 2006–2009
Becoming Less Wrong: The Sequences, 2006–2009

In the end, we might just call it The Sequences, 2006–2009, but I'd like to check whether somebody else can come up with a better name.

Suggestions?

(Update on 5/5/2013 is here.)

Estimate Stability

lukeprog — 2013-04-13T18:33:23.799Z

I've been trying to get clear on something you might call "estimate stability." Steven Kaas recently posted my question to StackExchange, but we might as well post it here as well:

I'm trying to reason about something I call "estimate stability," and I'm hoping you can tell me whether there’s some relevant technical language...

What do I mean by "estimate stability?" Consider these three different propositions:

We’re 50% sure that a coin (known to be fair) will land on heads.

We’re 50% sure that Matt will show up at the party.

We’re 50% sure that Strong AI will be invented by 2080.

These estimates feel different. One reason they feel different is that the estimates have different degrees of "stability." In case (1) we don't expect to gain information that will change our probability estimate. But for cases (2) and (3), we may well come upon some information that causes us to adjust the estimate either up or down.

So estimate (1) is more "stable," but I'm not sure how this should be quantified. Should I think of it in terms of running a Monte Carlo simulation of what future evidence might be, and looking at something like the variance of the distribution of the resulting estimates? What happens when it’s a whole probability distribution for e.g. the time Strong AI is invented? (Do you do calculate the stability of the probability density for every year, then average the result?)

Here are some other considerations that would be useful to relate more formally to considerations of estimate stability:

If we’re estimating some variable, having a narrow probability distribution (prior to future evidence with respect to which we’re trying to assess the stability) corresponds to having a lot of data. New data, in that case, would make less of a contribution in terms of changing the mean and reducing the variance.

There are differences in model uncertainty between the three cases. I know what model to use when predicting a coin flip. My method of predicting whether Matt will show up at a party is shakier, but I have some idea of what I’m doing. With the Strong AI case, I don’t really have any good idea of what I’m doing. Presumably model uncertainty is related to estimate stability, because the more model uncertainty we have, the more we can change our estimate by reducing our model uncertainty.

Another difference between the three cases is the degree to which our actions allow us to improve our estimates, increasing their stability. For example, we can reduce the uncertainty and increase the stability of our estimate about Matt by calling him, but we don’t really have any good ways to get better estimates of Strong AI timelines (other than by waiting).

Value-of-information affects how we should deal with delay. Estimates that are unstable in the face of evidence we expect to get in the future seem to imply higher VoI. This creates a reason to accept delays in our actions. Or if we can easily gather information that will make our estimates more accurate and stable, that means we have more reason to pay the cost of gathering that information. If we expect to forget information, or expect our future selves not to take information into account, dynamic inconsistency becomes important. This is another reason why estimates might be unstable. One possible strategy here is to precommit to have our estimates regress to the mean.

Thanks for any thoughts!

Fermi Estimates

lukeprog — 2013-04-11T17:52:28.708Z

Just before the Trinity test, Enrico Fermi decided he wanted a rough estimate of the blast's power before the diagnostic data came in. So he dropped some pieces of paper from his hand as the blast wave passed him, and used this to estimate that the blast was equivalent to 10 kilotons of TNT. His guess was remarkably accurate for having so little data: the true answer turned out to be 20 kilotons of TNT.

Fermi had a knack for making roughly-accurate estimates with very little data, and therefore such an estimate is known today as a Fermi estimate.

Why bother with Fermi estimates, if your estimates are likely to be off by a factor of 2 or even 10? Often, getting an estimate within a factor of 10 or 20 is enough to make a decision. So Fermi estimates can save you a lot of time, especially as you gain more practice at making them.

Estimation tips

These first two sections are adapted from Guestimation 2.0.

Dare to be imprecise. Round things off enough to do the calculations in your head. I call this the spherical cow principle, after a joke about how physicists oversimplify things to make calculations feasible:

Milk production at a dairy farm was low, so the farmer asked a local university for help. A multidisciplinary team of professors was assembled, headed by a theoretical physicist. After two weeks of observation and analysis, the physicist told the farmer, "I have the solution, but it only works in the case of spherical cows in a vacuum."

By the spherical cow principle, there are 300 days in a year, people are six feet (or 2 meters) tall, the circumference of the Earth is 20,000 mi (or 40,000 km), and cows are spheres of meat and bone 4 feet (or 1 meter) in diameter.

Decompose the problem. Sometimes you can give an estimate in one step, within a factor of 10. (How much does a new compact car cost? $20,000.) But in most cases, you'll need to break the problem into several pieces, estimate each of them, and then recombine them. I'll give several examples below.

Estimate by bounding. Sometimes it is easier to give lower and upper bounds than to give a point estimate. How much time per day does the average 15-year-old watch TV? I don't spend any time with 15-year-olds, so I haven't a clue. It could be 30 minutes, or 3 hours, or 5 hours, but I'm pretty confident it's more than 2 minutes and less than 7 hours (400 minutes, by the spherical cow principle).

Can we convert those bounds into an estimate? You bet. But we don't do it by taking the average. That would give us (2 mins + 400 mins)/2 = 201 mins, which is within a factor of 2 from our upper bound, but a factor 100 greater than our lower bound. Since our goal is to estimate the answer within a factor of 10, we'll probably be way off.

Instead, we take the geometric mean — the square root of the product of our upper and lower bounds. But square roots often require a calculator, so instead we'll take the approximate geometric mean (AGM). To do that, we average the coefficients and exponents of our upper and lower bounds.

So what is the AGM of 2 and 400? Well, 2 is 2×10⁰, and 400 is 4×10². The average of the coefficients (2 and 4) is 3; the average of the exponents (0 and 2) is 1. So, the AGM of 2 and 400 is 3×10¹, or 30. The precise geometric mean of 2 and 400 turns out to be 28.28. Not bad.

What if the sum of the exponents is an odd number? Then we round the resulting exponent down, and multiply the final answer by three. So suppose my lower and upper bounds for how much TV the average 15-year-old watches had been 20 mins and 400 mins. Now we calculate the AGM like this: 20 is 2×10¹, and 400 is still 4×10². The average of the coefficients (2 and 4) is 3; the average of the exponents (1 and 2) is 1.5. So we round the exponent down to 1, and we multiple the final result by three: 3(3×10¹) = 90 mins. The precise geometric mean of 20 and 400 is 89.44. Again, not bad.

Sanity-check your answer. You should always sanity-check your final estimate by comparing it to some reasonable analogue. You'll see examples of this below.

Use Google as needed. You can often quickly find the exact quantity you're trying to estimate on Google, or at least some piece of the problem. In those cases, it's probably not worth trying to estimate it without Google.

Fermi estimation failure modes

Fermi estimates go wrong in one of three ways.

First, we might badly overestimate or underestimate a quantity. Decomposing the problem, estimating from bounds, and looking up particular pieces on Google should protect against this. Overestimates and underestimates for the different pieces of a problem should roughly cancel out, especially when there are many pieces.

Second, we might model the problem incorrectly. If you estimate teenage deaths per year on the assumption that most teenage deaths are from suicide, your estimate will probably be way off, because most teenage deaths are caused by accidents. To avoid this, try to decompose each Fermi problem by using a model you're fairly confident of, even if it means you need to use more pieces or give wider bounds when estimating each quantity.

Finally, we might choose a nonlinear problem. Normally, we assume that if one object can get some result, then two objects will get twice the result. Unfortunately, this doesn't hold true for nonlinear problems. If one motorcycle on a highway can transport a person at 60 miles per hour, then 30 motorcycles can transport 30 people at 60 miles per hour. However, 10⁴ motorcycles cannot transport 10⁴ people at 60 miles per hour, because there will be a huge traffic jam on the highway. This problem is difficult to avoid, but with practice you will get better at recognizing when you're facing a nonlinear problem.

Fermi practice

When getting started with Fermi practice, I recommend estimating quantities that you can easily look up later, so that you can see how accurate your Fermi estimates tend to be. Don't look up the answer before constructing your estimates, though! Alternatively, you might allow yourself to look up particular pieces of the problem — e.g. the number of Sikhs in the world, the formula for escape velocity, or the gross world product — but not the final quantity you're trying to estimate.

Most books about Fermi estimates are filled with examples done by Fermi estimate experts, and in many cases the estimates were probably adjusted after the author looked up the true answers. This post is different. My examples below are estimates I made before looking up the answer online, so you can get a realistic picture of how this works from someone who isn't "cheating." Also, there will be no selection effect: I'm going to do four Fermi estimates for this post, and I'm not going to throw out my estimates if they are way off. Finally, I'm not all that practiced doing "Fermis" myself, so you'll get to see what it's like for a relative newbie to go through the process. In short, I hope to give you a realistic picture of what it's like to do Fermi practice when you're just getting started.

Example 1: How many new passenger cars are sold each year in the USA?

The classic Fermi problem is "How many piano tuners are there in Chicago?" This kind of estimate is useful if you want to know the approximate size of the customer base for a new product you might develop, for example. But I'm not sure anyone knows how many piano tuners there really are in Chicago, so let's try a different one we probably can look up later: "How many new passenger cars are sold each year in the USA?"

As with all Fermi problems, there are many different models we could build. For example, we could estimate how many new cars a dealership sells per month, and then we could estimate how many dealerships there are in the USA. Or we could try to estimate the annual demand for new cars from the country's population. Or, if we happened to have read how many Toyota Corollas were sold last year, we could try to build our estimate from there.

The second model looks more robust to me than the first, since I know roughly how many Americans there are, but I have no idea how many new-car dealerships there are. Still, let's try it both ways. (I don't happen to know how many new Corollas were sold last year.)

Approach #1: Car dealerships

How many new cars does a dealership sell per month, on average? Oofta, I dunno. To support the dealership's existence, I assume it has to be at least 5. But it's probably not more than 50, since most dealerships are in small towns that don't get much action. To get my point estimate, I'll take the AGM of 5 and 50. 5 is 5×10⁰, and 50 is 5×10¹. Our exponents sum to an odd number, so I'll round the exponent down to 0 and multiple the final answer by 3. So, my estimate of how many new cars a new-car dealership sells per month is 3(5×10⁰) = 15.

Now, how many new-car dealerships are there in the USA? This could be tough. I know several towns of only 10,000 people that have 3 or more new-car dealerships. I don't recall towns much smaller than that having new-car dealerships, so let's exclude them. How many cities of 10,000 people or more are there in the USA? I have no idea. So let's decompose this problem a bit more.

How many counties are there in the USA? I remember seeing a map of counties colored by which national ancestry was dominant in that county. (Germany was the most common.) Thinking of that map, there were definitely more than 300 counties on it, and definitely less than 20,000. What's the AGM of 300 and 20,000? Well, 300 is 3×10², and 20,000 is 2×10⁴. The average of coefficients 3 and 2 is 2.5, and the average of exponents 2 and 4 is 3. So the AGM of 300 and 20,000 is 2.5×10³ = 2500.

Now, how many towns of 10,000 people or more are there per county? I'm pretty sure the average must be larger than 10 and smaller than 5000. The AGM of 10 and 5000 is 300. (I won't include this calculation in the text anymore; you know how to do it.)

Finally, how many car dealerships are there in cities of 10,000 or more people, on average? Most such towns are pretty small, and probably have 2-6 car dealerships. The largest cities will have many more: maybe 100-ish. So I'm pretty sure the average number of car dealerships in cities of 10,000 or more people must be between 2 and 30. The AGM of 2 and 30 is 7.5.

Now I just multiply my estimates:

[15 new cars sold per month per dealership] × [12 months per year] × [7.5 new-car dealerships per city of 10,000 or more people] × [300 cities of 10,000 or more people per county] × [2500 counties in the USA] = 1,012,500,000.

A sanity check immediately invalidates this answer. There's no way that 300 million American citizens buy a billion new cars per year. I suppose they might buy 100 million new cars per year, which would be within a factor of 10 of my estimate, but I doubt it.

As I suspected, my first approach was problematic. Let's try the second approach, starting from the population of the USA.

Approach #2: Population of the USA

There are about 300 million Americans. How many of them own a car? Maybe 1/3 of them, since children don't own cars, many people in cities don't own cars, and many households share a car or two between the adults in the household.

Of the 100 million people who own a car, how many of them bought a new car in the past 5 years? Probably less than half; most people buy used cars, right? So maybe 1/4 of car owners bought a new car in the past 5 years, which means 1 in 20 car owners bought a new car in the past year.

100 million / 20 = 5 million new cars sold each year in the USA. That doesn't seem crazy, though perhaps a bit low. I'll take this as my estimate.

Now is your last chance to try this one on your own; in the next paragraph I'll reveal the true answer.

…

...

Now, I Google new cars sold per year in the USA. Wikipedia is the first result, and it says "In the year 2009, about 5.5 million new passenger cars were sold in the United States according to the U.S. Department of Transportation."

Boo-yah!

Example 2: How many fatalities from passenger-jet crashes have there been in the past 20 years?

Again, there are multiple models I could build. I could try to estimate how many passenger-jet flights there are per year, and then try to estimate the frequency of crashes and the average number of fatalities per crash. Or I could just try to guess the total number of passenger-jet crashes around the world per year and go from there.

As far as I can tell, passenger-jet crashes (with fatalities) almost always make it on the TV news and (more relevant to me) the front page of Google News. Exciting footage and multiple deaths will do that. So working just from memory, it feels to me like there are about 5 passenger-jet crashes (with fatalities) per year, so maybe there were about 100 passenger jet crashes with fatalities in the past 20 years.

Now, how many fatalities per crash? From memory, it seems like there are usually two kinds of crashes: ones where everybody dies (meaning: about 200 people?), and ones where only about 10 people die. I think the "everybody dead" crashes are less common, maybe 1/4 as common. So the average crash with fatalities should cause (200×1/4)+(10×3/4) = 50+7.5 = 60, by the spherical cow principle.

60 fatalities per crash × 100 crashes with fatalities over the past 20 years = 6000 passenger fatalities from passenger-jet crashes in the past 20 years.

Last chance to try this one on your own...

…

A Google search again brings me to Wikipedia, which reveals that an organization called ACRO records the number of airline fatalities each year. Unfortunately for my purposes, they include fatalities from cargo flights. After more Googling, I tracked down Boeing's "Statistical Summary of Commercial Jet Airplane Accidents, 1959-2011," but that report excludes jets lighter than 60,000 pounds, and excludes crashes caused by hijacking or terrorism.

It appears it would be a major research project to figure out the true answer to our question, but let's at least estimate it from the ACRO data. Luckily, ACRO has statistics on which percentage of accidents are from passenger and other kinds of flights, which I'll take as a proxy for which percentage of fatalities are from different kinds of flights. According to that page, 35.41% of accidents are from "regular schedule" flights, 7.75% of accidents are from "private" flights, 5.1% of accidents are from "charter" flights, and 4.02% of accidents are from "executive" flights. I think that captures what I had in mind as "passenger-jet flights." So we'll guess that 52.28% of fatalities are from "passenger-jet flights." I won't round this to 50% because we're not doing a Fermi estimate right now; we're trying to check a Fermi estimate.

According to ACRO's archives, there were 794 fatalities in 2012, 828 fatalities in 2011, and... well, from 1993-2012 there were a total of 28,021 fatalities. And 52.28% of that number is 14,649.

So my estimate of 6000 was off by less than a factor of 3!

Example 3: How much does the New York state government spends on K-12 education every year?

How might I estimate this? First I'll estimate the number of K-12 students in New York, and then I'll estimate how much this should cost.

How many people live in New York? I seem to recall that NYC's greater metropolitan area is about 20 million people. That's probably most of the state's population, so I'll guess the total is about 30 million.

How many of those 30 million people attend K-12 public schools? I can't remember what the United States' population pyramid looks like, but I'll guess that about 1/6 of Americans (and hopefully New Yorkers) attend K-12 at any given time. So that's 5 million kids in K-12 in New York. The number attending private schools probably isn't large enough to matter for factor-of-10 estimates.

How much does a year of K-12 education cost for one child? Well, I've heard teachers don't get paid much, so after benefits and taxes and so on I'm guessing a teacher costs about $70,000 per year. How big are class sizes these days, 30 kids? By the spherical cow principle, that's about $2,000 per child, per year on teachers' salaries. But there are lots of other expenses: buildings, transport, materials, support staff, etc. And maybe some money goes to private schools or other organizations. Rather than estimate all those things, I'm just going to guess that about $10,000 is spent per child, per year.

If that's right, then New York spends $50 billion per year on K-12 education.

Last chance to make your own estimate!

…

Before I did the Fermi estimate, I had Julia Galef check Google to find this statistic, but she didn't give me any hints about the number. Her two sources were Wolfram Alpha and a web chat with New York's Deputy Secretary for Education, both of which put the figure at approximately $53 billion.

Which is definitely within a factor of 10 from $50 billion. :)

Example 4: How many plays of My Bloody Valentine's "Only Shallow" have been reported to last.fm?

Last.fm makes a record of every audio track you play, if you enable the relevant feature or plugin for the music software on your phone, computer, or other device. Then, the service can show you charts and statistics about your listening patterns, and make personalized music recommendations from them. My own charts are here. (Chuck Wild / Liquid Mind dominates my charts because I used to listen to that artist while sleeping.)

My Fermi problem is: How many plays of "Only Shallow" have been reported to last.fm?

My Bloody Valentine is a popular "indie" rock band, and "Only Shallow" is probably one of their most popular tracks. How can I estimate how many plays it has gotten on last.fm?

What do I know that might help?

I know last.fm is popular, but I don't have a sense of whether they have 1 million users, 10 million users, or 100 million users.
I accidentally saw on Last.fm's Wikipedia page that just over 50 billion track plays have been recorded. We'll consider that to be one piece of data I looked up to help with my estimate.
I seem to recall reading that major music services like iTunes and Spotify have about 10 million tracks. Since last.fm records songs that people play from their private collections, whether or not they exist in popular databases, I'd guess that the total number of different tracks named in last.fm's database is an order of magnitude larger, for about 100 million tracks named in its database.

I would guess that track plays obey a power law, with the most popular tracks getting vastly more plays than tracks of average popularity. I'd also guess that there are maybe 10,000 tracks more popular than "Only Shallow."

Next, I simulated being good at math by having Qiaochu Yuan show me how to do the calculation. I also allowed myself to use a calculator. Here's what we do:

Plays(rank) = C/(rank^P)

P is the exponent for the power law, and C is the proportionality constant. We'll guess that P is 1, a common power law exponent for empirical data. And we calculate C like so:

C ≈ [total plays]/ln(total songs) ≈ 2.5 billion

So now, assuming the song's rank is 10,000, we have:

Plays(10⁴) = 2.5×10⁹/(10⁴)

Plays("Only Shallow") = 250,000

That seems high, but let's roll with it. Last chance to make your own estimate!

…

...

And when I check the answer, I see that "Only Shallow" has about 2 million plays on last.fm.

My answer was off by less than a factor of 10, which for a Fermi estimate is called victory!

Unfortunately, last.fm doesn't publish all-time track rankings or other data that might help me to determine which parts of my model were correct and incorrect.

Further examples

I focused on examples that are similar in structure to the kinds of quantities that entrepreneurs and CEOs might want to estimate, but of course there are all kinds of things one can estimate this way. Here's a sampling of Fermi problems featured in various books and websites on the subject:

Play Fermi Questions: 2100 Fermi problems and counting.

Guesstimation (2008): If all the humans in the world were crammed together, how much area would we require? What would be the mass of all 10⁸ MongaMillions lottery tickets? On average, how many people are airborne over the US at any given moment? How many cells are there in the human body? How many people in the world are picking their nose right now? What are the relative costs of fuel for NYC rickshaws and automobiles?

Guesstimation 2.0 (2011): If we launched a trillion one-dollar bills into the atmosphere, what fraction of sunlight hitting the Earth could we block with those dollar bills? If a million monkeys typed randomly on a million typewriters for a year, what is the longest string of consecutive correct letters of *The Cat in the Hat (starting from the beginning) would they likely type? How much energy does it take to crack a nut? If an airline asked its passengers to urinate before boarding the airplane, how much fuel would the airline save per flight? What is the radius of the largest rocky sphere from which we can reach escape velocity by jumping?

How Many Licks? (2009): What fraction of Earth's volume would a mole of hot, sticky, chocolate-jelly doughnuts be? How many miles does a person walk in a lifetime? How many times can you outline the continental US in shoelaces? How long would it take to read every book in the library? How long can you shower and still make it more environmentally friendly than taking a bath?

Ballparking (2012): How many bolts are in the floor of the Boston Garden basketball court? How many lanes would you need for the outermost lane of a running track to be the length of a marathon? How hard would you have to hit a baseball for it to never land?

University of Maryland Fermi Problems Site: How many sheets of letter-sized paper are used by all students at the University of Maryland in one semester? How many blades of grass are in the lawn of a typical suburban house in the summer? How many golf balls can be fit into a typical suitcase?

Stupid Calculations: a blog of silly-topic Fermi estimates.

Conclusion

Fermi estimates can help you become more efficient in your day-to-day life, and give you increased confidence in the decisions you face. If you want to become proficient in making Fermi estimates, I recommend practicing them 30 minutes per day for three months. In that time, you should be able to make about (2 Fermis per day)×(90 days) = 180 Fermi estimates.

If you'd like to write down your estimation attempts and then publish them here, please do so as a reply to this comment. One Fermi estimate per comment, please!

Alternatively, post your Fermi estimates to the dedicated subreddit.

Update 03/06/2017: I keep getting requests from professors to use this in their classes, so: I license anyone to use this article noncommercially, so long as its authorship is noted (me = Luke Muehlhauser).

Explicit and tacit rationality

lukeprog — 2013-04-09T23:33:29.127Z

Like Eliezer, I "do my best thinking into a keyboard." It starts with a burning itch to figure something out. I collect ideas and arguments and evidence and sources. I arrange them, tweak them, criticize them. I explain it all in my own words so I can understand it better. By then it is nearly something that others would want to read, so I clean it up and publish, say, How to Beat Procrastination. I write essays in the original sense of the word: "attempts."

This time, I'm trying to figure out something we might call "tacit rationality" (c.f. tacit knowledge).

I tried and failed to write a good post about tacit rationality, so I wrote a bad post instead — one that is basically a patchwork of somewhat-related musings on explicit and tacit rationality. Therefore I'm posting this article to LW Discussion. I hope the ensuing discussion ends up leading somewhere with more clarity and usefulness.

Three methods for training rationality

Which of these three options do you think will train rationality (i.e. systematized winning, or "winning-rationality") most effectively?

Spend one year reading and re-reading The Sequences, studying the math and cognitive science of rationality, and discussing rationality online and at Less Wrong meetups.
Attend a CFAR workshop, then spend the next year practicing those skills and other rationality habits every week.
Run a startup or small business for one year.

Option 1 seems to be pretty effective at training people to talk intelligently about rationality (let's call that "talking-rationality"), and it seems to inoculate people against some common philosophical mistakes.

We don't yet have any examples of someone doing Option 2 (the first CFAR workshop was May 2012), but I'd expect Option 2 — if actually executed — to result in more winning-rationality than Option 1, and also a modicum of talking-rationality.

What about Option 3? Unlike Option 2 or especially Option 1, I'd expect it to train almost no ability to talk intelligently about rationality. But I would expect it to result in relatively good winning-rationality, due to its tight feedback loops.

Talking-rationality and winning-rationality can come apart

I've come to believe... that the best way to succeed is to discover what you love and then find a way to offer it to others in the form of service, working hard, and also allowing the energy of the universe to lead you.

Oprah Winfrey

Oprah isn't known for being a rational thinker. She is a known peddler of pseudoscience, and she attributes her success (in part) to allowing "the energy of the universe" to lead her.

Yet she must be doing something right. Oprah is a true rags-to-riches story. Born in Mississippi to an unwed teenage housemaid, she was so poor she wore dresses made of potato sacks. She was molested by a cousin, an uncle, and a family friend. She became pregnant at age 14.

But in high school she became an honors student, won oratory contests and a beauty pageant, and was hired by a local radio station to report the news. She became the youngest-ever news anchor at Nashville's WLAC-TV, then hosted several shows in Baltimore, then moved to Chicago and within months her own talk show shot from last place to first place in the ratings there. Shortly afterward her show went national. She also produced and starred in several TV shows, was nominated for an Oscar for her role in a Steven Spielberg movie, launched her own TV cable network and her own magazine (the "most successful startup ever in the [magazine] industry" according to Fortune), and became the world's first female black billionaire.

I'd like to suggest that Oprah's climb probably didn't come merely through inborn talent, hard work, and luck. To get from potato sack dresses to the Forbes billionaire list, Oprah had to make thousands of pretty good decisions. She had to make pretty accurate guesses about the likely consequences of various actions she could take. When she was wrong, she had to correct course fairly quickly. In short, she had to be fairly rational, at least in some domains of her life.

Similarly, I know plenty of business managers and entrepreneurs who have a steady track record of good decisions and wise judgments, and yet they are religious, or they commit basic errors in logic and probability when they talk about non-business subjects.

What's going on here? My guess is that successful entrepreneurs and business managers and other people must have pretty good tacit rationality, even if they aren't very proficient with the "rationality" concepts that Less Wrongers tend to discuss on a daily basis. Stated another way, successful businesspeople make fairly rational decisions and judgments, even though they may confabulate rather silly explanations for their success, and even though they don't understand the math or science of rationality well.

LWers can probably outperform Mark Zuckerberg on the CRT and the Berlin Numeracy Test, but Zuckerberg is laughing at them from atop a huge pile of utility.

Explicit and tacit rationality

Patri Friedman, in Self-Improvement or Shiny Distraction: Why Less Wrong is anti-Instrumental Rationality, reminded us that skill acquisition comes from deliberate practice, and reading LW is a "shiny distraction," not deliberate practice. He said a real rationality practice would look more like... well, what Patri describes is basically CFAR, though CFAR didn't exist at the time.

In response, and again long before CFAR existed, Anna Salamon wrote Goals for which Less Wrong does (and doesn't) help. Summary: Some domains provide rich, cheap feedback, so you don't need much LW-style rationality to become successful in those domains. But many of us have goals in domains that don't offer rapid feedback: e.g. whether to buy cryonics, which 40-year investments are safe, which metaethics to endorse. For this kind of thing you need LW-style rationality. (We could also state this as "Domains with rapid feedback train tacit rationality with respect to those domains, but for domains without rapid feedback you've got to do the best you can with LW-style "explicit rationality".)

The good news is that you should be able to combine explicit and tacit rationality. Explicit rationality can help you realize that you should force tight feedback loops into whichever domains you want to succeed in, so that you can have develop good intuitions about how to succeed in those domains. (See also: Lean Startup or Lean Nonprofit methods.)

Explicit rationality could also help you realize that the cognitive biases most-discussed in the literature aren't necessarily the ones you should focus on ameliorating, as Aaron Swartz wrote:

Cognitive biases cause people to make choices that are most obviously irrational, but not most importantly irrational... Since cognitive biases are the primary focus of research into rationality, rationality tests mostly measure how good you are at avoiding them... LW readers tend to be fairly good at avoiding cognitive biases... But there a whole series of much more important irrationalities that LWers suffer from. (Let's call them "practical biases" as opposed to "cognitive biases," even though both are ultimately practical and cognitive.)

...Rationality, properly understood, is in fact a predictor of success. Perhaps if LWers used success as their metric (as opposed to getting better at avoiding obvious mistakes), they might focus on their most important irrationalities (instead of their most obvious ones), which would lead them to be more rational and more successful.

Final scattered thoughts

If someone is consistently winning, and not just because they have tons of wealth or fame, then maybe you should conclude they have pretty good tacit rationality even if their explicit rationality is terrible.
The positive effects of tight feedback loops might trump the effects of explicit rationality training.
Still, I suspect explicit rationality plus tight feedback loops could lead to the best results of all.
I really hope we can develop a real rationality dojo.
If you're reading this post, you're probably spending too much time reading Less Wrong, and too little time hacking your motivation system, learning social skills, and learning how to inject tight feedback loops into everything you can.

Critiques of the heuristics and biases tradition

lukeprog — 2013-03-18T23:49:57.035Z

The chapter on judgment under uncertainty in the (excellent) new Oxford Handbook of Cognitive Psychology has a handy little section on recent critiques of the "heuristics and biases" tradition. It also discusses problems with the somewhat-competing "fast and frugal heuristics" school of thought, but for now let me just quote the section on heuristics and biases (pp. 608-609):

The heuristics and biases program has been highly influential; however, some have argued that in recent years the influence, at least in psychology, has waned (McKenzie, 2005). This waning has been due in part to pointed critiques of the approach (e.g., Gigerenzer, 1996). This critique comprises two main arguments: (1) that by focusing mainly on coherence standards [e.g. their rationality given the subject's other beliefs, as contrasted with correspondence standards having to do with the real-world accuracy of a subject's beliefs] the approach ignores the role played by the environment or the context in which a judgment is made; and (2) that the explanations of phenomena via one-word labels such as availability, anchoring, and representativeness are vague, insufficient, and say nothing about the processes underlying judgment (see Kahneman, 2003; Kahneman & Tversky, 1996 for responses to this critique).

The accuracy of some of the heuristics proposed by Tversky and Kahneman can be compared to correspondence criteria (availability and anchoring). Thus, arguing that the tradition only uses the “narrow norms” (Gigerenzer, 1996) of coherence criteria is not strictly accurate (cf. Dunwoody, 2009). Nonetheless, responses in famous examples like the Linda problem can be reinterpreted as sensible rather than erroneous if one uses conversational or pragmatic norms rather than those derived from probability theory (Hilton, 1995). For example, Hertwig, Benz and Krauss (2008) asked participants which of the following two statements is more probable:

[X] The percentage of adolescent smokers in Germany decreases at least 15% from current levels by September 1, 2003.

[X&Y] The tobacco tax in Germany is increased by 5 cents per cigarette and the percentage of adolescent smokers in Germany decreases at least 15% from current levels by September 1, 2003.

According to the conjunction rule, [X&Y cannot be more probable than X] and yet the majority of participants ranked the statements in that order. However, when subsequently asked to rank order four statements in order of how well each one described their understanding of X&Y, there was an overwhelming tendency to rank statements like “X and therefore Y” or “X and X is the cause for Y” higher than the simple conjunction “X and Y.” Moreover, the minority of participants who did not commit the conjunction fallacy in the first judgment showed internal coherence by ranking “X and Y” as best describing their understanding in the second judgment.These results suggest that people adopt a causal understanding of the statements, in essence ranking the probability of X, given Y as more probable than X occurring alone. If so, then arguably the conjunction “error” is no longer incorrect. (See Moro, 2009 for extensive discussion of the reasons underlying the conjunction fallacy, including why “misunderstanding” cannot explain all instances of the fallacy.)

The “vagueness” argument can be illustrated by considering two related phenomena: the gambler’s fallacy and the hot-hand (Gigerenzer & Brighton, 2009). The gambler’s fallacy is the tendency for people to predict the opposite outcome after a run of the same outcome (e.g., predicting heads after a run of tails when flipping a fair coin); the hot-hand, in contrast, is the tendency to predict a run will continue (e.g., a player making a shot in basketball after a succession of baskets; Gilovich, Vallone, & Tversky, 1985). Ayton and Fischer (2004) pointed out that although these two behaviors are opposite - ending or continuing runs - they have both been explained via the label “representativeness.” In both cases a faulty concept of randomness leads people to expect short sections of a sequence to be “representative” of their generating process. In the case of the coin, people believe (erroneously) that long runs should not occur, so the opposite outcome is predicted; for the player, the presence of long runs rules out a random process so a continuation is predicted (Gilovich et al., 1985). The “representativeness” explanation is therefore incomplete without specifying a priori which of the opposing prior expectations will result. More important, representativeness alone does not explain why people have the misconception that random sequences should exhibit local representativeness when in reality they do not (Ayton & Fischer, 2004).

My thanks to MIRI intern Stephen Barnes for transcribing this text.

Decision Theory FAQ

lukeprog — 2013-02-28T14:15:55.090Z

Co-authored with crazy88. Please let us know when you find mistakes, and we'll fix them. Last updated 03-27-2013.

Contents:

1. What is decision theory?

Decision theory, also known as rational choice theory, concerns the study of preferences, uncertainties, and other issues related to making "optimal" or "rational" choices. It has been discussed by economists, psychologists, philosophers, mathematicians, statisticians, and computer scientists.

We can divide decision theory into three parts (Grant & Zandt 2009; Baron 2008). Normative decision theory studies what an ideal agent (a perfectly rational agent, with infinite computing power, etc.) would choose. Descriptive decision theory studies how non-ideal agents (e.g. humans) actually choose. Prescriptive decision theory studies how non-ideal agents can improve their decision-making (relative to the normative model) despite their imperfections.

For example, one's normative model might be expected utility theory, which says that a rational agent chooses the action with the highest expected utility. Replicated results in psychology describe humans repeatedly failing to maximize expected utility in particular, predictable ways: for example, they make some choices based not on potential future benefits but on irrelevant past efforts (the "sunk cost fallacy"). To help people avoid this error, some theorists prescribe some basic training in microeconomics, which has been shown to reduce the likelihood that humans will commit the sunk costs fallacy (Larrick et al. 1990). Thus, through a coordination of normative, descriptive, and prescriptive research we can help agents to succeed in life by acting more in accordance with the normative model than they otherwise would.

This FAQ focuses on normative decision theory. Good sources on descriptive and prescriptive decision theory include Stanovich (2010) and Hastie & Dawes (2009).

Two related fields beyond the scope of this FAQ are game theory and social choice theory. Game theory is the study of conflict and cooperation among multiple decision makers, and is thus sometimes called "interactive decision theory." Social choice theory is the study of making a collective decision by combining the preferences of multiple decision makers in various ways.

This FAQ draws heavily from two textbooks on decision theory: Resnik (1987) and Peterson (2009). It also draws from more recent results in decision theory, published in journals such as Synthese and Theory and Decision.

2. Is the rational decision always the right decision?

No. Peterson (2009, ch. 1) explains:

[In 1700], King Carl of Sweden and his 8,000 troops attacked the Russian army [which] had about ten times as many troops... Most historians agree that the Swedish attack was irrational, since it was almost certain to fail... However, because of an unexpected blizzard that blinded the Russian army, the Swedes won...

Looking back, the Swedes' decision to attack the Russian army was no doubt right, since the actual outcome turned out to be success. However, since the Swedes had no good reason for expecting that they were going to win, the decision was nevertheless irrational.

More generally speaking, we say that a decision is right if and only if its actual outcome is at least as good as that of every other possible outcome. Furthermore, we say that a decision is rational if and only if the decision maker [aka the "agent"] chooses to do what she has most reason to do at the point in time at which the decision is made.

Unfortunately, we cannot know with certainty what the right decision is. Thus, the best we can do is to try to make "rational" or "optimal" decisions based on our preferences and incomplete information.

3. How can I better understand a decision problem?

First, we must formalize a decision problem. It usually helps to visualize the decision problem, too.

In decision theory, decision rules are only defined relative to a formalization of a given decision problem, and a formalization of a decision problem can be visualized in multiple ways. Here is an example from Peterson (2009, ch. 2):

Suppose... that you are thinking about taking out fire insurance on your home. Perhaps it costs $100 to take out insurance on a house worth $100,000, and you ask: Is it worth it?

The most common way to formalize a decision problem is to break it into states, acts, and outcomes. When facing a decision problem, the decision maker aims to choose the act that will have the best outcome. But the outcome of each act depends on the state of the world, which is unknown to the decision maker.

In this framework, speaking loosely, a state is a part of the world that is not an act (that can be performed now by the decision maker) or an outcome (the question of what, more precisely, states are is a complex question that is beyond the scope of this document). Luckily, not all states are relevant to a particular decision problem. We only need to take into account states that affect the agent's preference among acts. A simple formalization of the fire insurance problem might include only two states: the state in which your house doesn't (later) catch on fire, and the state in which your house does (later) catch on fire.

Presumably, the agent prefers some outcomes to others. Suppose the four conceivable outcomes in the above decision problem are: (1) House and $0, (2) House and -$100, (3) No house and $99,900, and (4) No house and $0. In this case, the decision maker might prefer outcome 1 over outcome 2, outcome 2 over outcome 3, and outcome 3 over outcome 4. (We'll discuss measures of value for outcomes in the next section.)

An act is commonly taken to be a function that takes one set of the possible states of the world as input and gives a particular outcome as output. For the above decision problem we could say that if the act "Take out insurance" has the world-state "Fire" as its input, then it will give the outcome "No house and $99,900" as its output.

An outline of the states, acts and outcomes in the insurance case

Note that decision theory is concerned with particular acts rather than generic acts, e.g. "sailing west in 1492" rather than "sailing." Moreover, the acts of a decision problem must be alternative acts, so that the decision maker has to choose exactly one act.

Once a decision problem has been formalized, it can then be visualized in any of several ways.

One way to visualize this decision problem is to use a decision matrix:

	Fire	No fire
Take out insurance	No house and $99,900	House and -$100
No insurance	No house and $0	House and $0

Another way to visualize this problem is to use a decision tree:

The square is a choice node, the circles are chance nodes, and the triangles are terminal nodes. At the choice node, the decision maker chooses which branch of the decision tree to take. At the chance nodes, nature decides which branch to follow. The triangles represent outcomes.

Of course, we could add more branches to each choice node and each chance node. We could also add more choice nodes, in which case we are representing a sequential decision problem. Finally, we could add probabilities to each branch, as long as the probabilities of all the branches extending from each single node sum to 1. And because a decision tree obeys the laws of probability theory, we can calculate the probability of any given node by multiplying the probabilities of all the branches preceding it.

Our decision problem could also be represented as a vector — an ordered list of mathematical objects that is perhaps most suitable for computers:

[
[a₁ = take out insurance,
a₂ = do not];
[s₁ = fire,
s₂ = no fire];
[(a₁, s₁) = No house and $99,900,
(a₁, s₂) = House and -$100,
(a₂, s₁) = No house and $0,
(a₂, s₂) = House and $0]
]

For more details on formalizing and visualizing decision problems, see Skinner (1993).

4. How can I measure an agent's preferences?

4.1. The concept of utility

It is important not to measure an agent's preferences in terms of objective value, e.g. monetary value. To see why, consider the absurdities that can result when we try to measure an agent's preference with money alone.

Suppose you may choose between (A) receiving a million dollars for sure, and (B) a 50% chance of winning either $3 million or nothing. The expected monetary value (EMV) of your act is computed by multiplying the monetary value of each possible outcome by its probability. So, the EMV of choice A is (1)($1 million) = $1 million. The EMV of choice B is (0.5)($3 million) + (0.5)($0) = $1.5 million. Choice B has a higher expected monetary value, and yet many people would prefer the guaranteed million.

Why? For many people, the difference between having $0 and $1 million is subjectively much larger than the difference between having $1 million and $3 million, even if the latter difference is larger in dollars.

To capture an agent's subjective preferences, we use the concept of utility. A utility function assigns numbers to outcomes such that outcomes with higher numbers are preferred to outcomes with lower numbers. For example, for a particular decision maker — say, one who has no money — the utility of $0 might be 0, the utility of $1 million might be 1000, and the utility of $3 million might be 1500. Thus, the expected utility (EU) of choice A is, for this decision maker, (1)(1000) = 1000. Meanwhile, the EU of choice B is (0.5)(1500) + (0.5)(0) = 750. In this case, the expected utility of choice A is greater than that of choice B, even though choice B has a greater expected monetary value.

Note that those from the field of statistics who work on decision theory tend to talk about a "loss function," which is simply an inverse utility function. For an overview of decision theory from this perspective, see Berger (1985) and Robert (2001). For a critique of some standard results in statistical decision theory, see Jaynes (2003, ch. 13).

4.2. Types of utility

An agent's utility function can't be directly observed, so it must be constructed — e.g. by asking them which options they prefer for a large set of pairs of alternatives (as on WhoIsHotter.com). The number that corresponds to an outcome's utility can convey different information depending on the utility scale in use, and the utility scale in use depends on how the utility function is constructed.

Decision theorists distinguish three kinds of utility scales:

Ordinal scales ("12 is better than 6"). In an ordinal scale, preferred outcomes are assigned higher numbers, but the numbers don't tell us anything about the differences or ratios between the utility of different outcomes.
Interval scales ("the difference between 12 and 6 equals that between 6 and 0"). An interval scale gives us more information than an ordinal scale. Not only are preferred outcomes assigned higher numbers, but also the numbers accurately reflect the difference between the utility of different outcomes. They do not, however, necessarily reflect the ratios of utility between different outcomes. If outcome A has utility 0, outcome B has utility 6, and outcome C has utility 12 on an interval scale, then we know that the difference in utility between outcomes A and B and between outcomes B and C is the same, but we can't know whether outcome B is "twice as good" as outcome A.
Ratio scales ("12 is exactly twice as valuable as 6"). Numerical utility assignments on a ratio scale give us the most information of all. They accurately reflect preference rankings, differences, and ratios. Thus, we can say that an outcome with utility 12 is exactly twice as valuable to the agent in question as an outcome with utility 6.

Note that neither experienced utility (happiness) nor the notions of "average utility" or "total utility" discussed by utilitarian moral philosophers are the same thing as the decision utility that we are discussing now to describe decision preferences. As the situation merits, we can be even more specific. For example, when discussing the type of decision utility used in an interval scale utility function constructed using Von Neumann & Morgenstern's axiomatic approach (see section 8), some people use the term VNM-utility.

Now that you know that an agent's preferences can be represented as a "utility function," and that assignments of utility to outcomes can mean different things depending on the utility scale of the utility function, we are ready to think more formally about the challenge of making "optimal" or "rational" choices. (We will return to the problem of constructing an agent's utility function later, in section 8.3.)

5. What do decision theorists mean by "risk," "ignorance," and "uncertainty"?

Peterson (2009, ch. 1) explains:

In decision theory, everyday terms such as risk, ignorance, and uncertainty are used as technical terms with precise meanings. In decisions under risk the decision maker knows the probability of the possible outcomes, whereas in decisions under ignorance the probabilities are either unknown or non-existent. Uncertainty is either used as a synonym for ignorance, or as a broader term referring to both risk and ignorance.

In this FAQ, a "decision under ignorance" is one in which probabilities are not assigned to all outcomes, and a "decision under uncertainty" is one in which probabilities are assigned to all outcomes. The term "risk" will be reserved for discussions related to utility.

6. How should I make decisions under ignorance?

A decision maker faces a "decision under ignorance" when she (1) knows which acts she could choose and which outcomes they may result in, but (2) is unable to assign probabilities to the outcomes.

(Note that many theorists think that all decisions under ignorance can be transformed into decisions under uncertainty, in which case this section will be irrelevant except for subsection 6.1. For details, see section 7.)

6.1. The dominance principle

To borrow an example from Peterson (2009, ch. 3), suppose that Jane isn't sure whether to order hamburger or monkfish at a new restaurant. Just about any chef can make an edible hamburger, and she knows that monkfish is fantastic if prepared by a world-class chef, but she also recalls that monkfish is difficult to cook. Unfortunately, she knows too little about this restaurant to assign any probability to the prospect of getting good monkfish. Her decision matrix might look like this:

	Good chef	Bad chef
Monkfish	good monkfish	terrible monkfish
Hamburger	edible hamburger	edible hamburger
No main course	hungry	hungry

Here, decision theorists would say that the "hamburger" choice dominates the "no main course" choice. This is because choosing the hamburger leads to a better outcome for Jane no matter which possible state of the world (good chef or bad chef) turns out to be true.

This dominance principle comes in two forms:

Weak dominance: One act is more rational than another if (1) all its possible outcomes are at least as good as those of the other, and if (2) there is at least one possible outcome that is better than that of the other act.
Strong dominance: One act is more rational than another if all of its possible outcome are better than that of the other act.

A comparison of strong and weak dominance

The dominance principle can also be applied to decisions under uncertainty (in which probabilities are assigned to all the outcomes). If we assign probabilities to outcomes, it is still rational to choose one act over another act if all its outcomes are at least as good as the outcomes of the other act.

However, the dominance principle only applies (non-controversially) when the agent’s acts are independent of the state of the world. So consider the decision of whether to steal a coat:

	Charged with theft	Not charged with theft
Theft	Jail and coat	Freedom and coat
No theft	Jail	Freedom

In this case, stealing the coat dominates not doing so but isn’t necessarily the rational decision. After all, stealing increases your chance of getting charged with theft and might be irrational for this reason. So dominance doesn’t apply in cases like this where the state of the world is not independent of the agents act.

On top of this, not all decision problems include an act that dominates all the others. Consequently additional principles are often required to reach a decision.

6.2. Maximin and leximin

Some decision theorists have suggested the maximin principle: if the worst possible outcome of one act is better than the worst possible outcome of another act, then the former act should be chosen. In Jane's decision problem above, the maximin principle would prescribe choosing the hamburger, because the worst possible outcome of choosing the hamburger ("edible hamburger") is better than the worst possible outcome of choosing the monkfish ("terrible monkfish") and is also better than the worst possible outcome of eating no main course ("hungry").

If the worst outcomes of two or more acts are equally good, the maximin principle tells you to be indifferent between them. But that doesn't seem right. For this reason, fans of the maximin principle often invoke the lexical maximin principle ("leximin"), which says that if the worst outcomes of two or more acts are equally good, one should choose the act for which the second worst outcome is best. (If that doesn't single out a single act, then the third worst outcome should be considered, and so on.)

Why adopt the leximin principle? Advocates point out that the leximin principle transforms a decision problem under ignorance into a decision problem under partial certainty. The decision maker doesn't know what the outcome will be, but they know what the worst possible outcome will be.

But in some cases, the leximin rule seems clearly irrational. Imagine this decision problem, with two possible acts and two possible states of the world:

	s₁	s₂
a₁	$1	$10,001.01
a₂	$1.01	$1.01

In this situation, the leximin principle prescribes choosing a₂. But most people would agree it is rational to risk losing out on a single cent for the chance to get an extra $10,000.

6.3. Maximax and optimism-pessimism

The maximin and leximin rules focus their attention on the worst possible outcomes of a decision, but why not focus on the best possible outcome? The maximax principle prescribes that if the best possible outcome of one act is better than the best possible outcome of another act, then the former act should be chosen.

More popular among decision theorists is the optimism-pessimism rule (aka the alpha-index rule). The optimism-pessimism rule prescribes that one consider both the best and worst possible outcome of each possible act, and then choose according to one's degree of optimism or pessimism.

Here's an example from Peterson (2009, ch. 3):

	s₁	s₂	s₃	s₄	s₅	s₆
a₁	55	18	28	10	36	100
a₂	50	87	55	90	75	70

We represent the decision maker's level of optimism on a scale of 0 to 1, where 0 is maximal pessimism and 1 is maximal optimism. For a₁, the worst possible outcome is 10 and the best possible outcome is 100. That is, min(a₁) = 10 and max(a₁) = 100. So if the decision maker is 0.85 optimistic, then the total value of a₁ is (0.85)(100) + (1 - 0.85)(10) = 86.5, and the total value of a₂ is (0.85)(90) + (1 - 0.85)(50) = 84. In this situation, the optimism-pessimism rule prescribes action a₁.

If the decision maker's optimism is 0, then the optimism-pessimism rule collapses into the maximin rule because (0)(max(a_i)) + (1 - 0)(min(a_i)) = min(a_i). And if the decision maker's optimism is 1, then the optimism-pessimism rule collapses into the maximax rule. Thus, the optimism-pessimism rule turns out to be a generalization of the maximin and maximax rules. (Well, sort of. The minimax and maximax principles require only that we measure value on an ordinal scale, whereas the optimism-pessimism rule requires that we measure value on an interval scale.)

The optimism-pessimism rule pays attention to both the best-case and worst-case scenarios, but is it rational to ignore all the outcomes in between? Consider this example:

	s₁	s₂	s₃
a₁	1	2	100
a₂	1	99	100

The maximum and minimum values for a₁ and a₂ are the same, so for every degree of optimism both acts are equally good. But it seems obvious that one should choose a₂.

6.4. Other decision principles

Many other decision principles for dealing with decisions under ignorance have been proposed, including minimax regret, info-gap, and maxipok. For more details on making decisions under ignorance, see Peterson (2009) and Bossert et al. (2000).

One queer feature of the decision principles discussed in this section is that they willfully disregard some information relevant to making a decision. Such a move could make sense when trying to find a decision algorithm that performs well under tight limits on available computation (Brafman & Tennenholtz (2000)), but it's unclear why an ideal agent with infinite computing power (fit for a normative rather than a prescriptive theory) should willfully disregard information.

7. Can decisions under ignorance be transformed into decisions under uncertainty?

Can decisions under ignorance be transformed into decisions under uncertainty? This would simplify things greatly, because there is near-universal agreement that decisions under uncertainty should be handled by "maximizing expected utility" (see section 11 for clarifications), whereas decision theorists still debate what should be done about decisions under ignorance.

For Bayesians (see section 10), all decisions under ignorance are transformed into decisions under uncertainty (Winkler 2003, ch. 5) when the decision maker assigns an "ignorance prior" to each outcome for which they don't know how to assign a probability. (Another way of saying this is to say that a Bayesian decision maker never faces a decision under ignorance, because a Bayesian must always assign a prior probability to events.) One must then consider how to assign priors, an important debate among Bayesians (see section 10).

Many non-Bayesian decision theorists also think that decisions under ignorance can be transformed into decisions under uncertainty due to something called the principle of insufficient reason. The principle of insufficient reason prescribes that if you have literally no reason to think that one state is more probable than another, then one should assign equal probability to both states.

One objection to the principle of insufficient reason is that it is very sensitive to how states are individuated. Peterson (2009, ch. 3) explains:

Suppose that before embarking on a trip you consider whether to bring an umbrella or not. [But] you know nothing about the weather at your destination. If the formalization of the decision problem is taken to include only two states, viz. rain and no rain, [then by the principle of insufficient reason] the probability of each state will be 1/2. However, it seems that one might just as well go for a formalization that divides the space of possibilities into three states, viz. heavy rain, moderate rain, and no rain. If the principle of insufficient reason is applied to the latter set of states, their probabilities will be 1/3. In some cases this difference will affect our decisions. Hence, it seems that anyone advocating the principle of insufficient reason must [defend] the rather implausible hypothesis that there is only one correct way of making up the set of states.

An objection to the principle of insufficient reason

Advocates of the principle of insufficient reason might respond that one must consider symmetric states. For example if someone gives you a die with n sides and you have no reason to think the die is biased, then you should assign a probability of 1/n to each side. But, Peterson notes:

...not all events can be described in symmetric terms, at least not in a way that justifies the conclusion that they are equally probable. Whether Ann's marriage will be a happy one depends on her future emotional attitude toward her husband. According to one description, she could be either in love or not in love with him; then the probability of both states would be 1/2. According to another equally plausible description, she could either be deeply in love, a little bit in love or not at all in love with her husband; then the probability of each state would be 1/3.

8. How should I make decisions under uncertainty?

A decision maker faces a "decision under uncertainty" when she (1) knows which acts she could choose and which outcomes they may result in, and she (2) assigns probabilities to the outcomes.

Decision theorists generally agree that when facing a decision under uncertainty, it is rational to choose the act with the highest expected utility. This is the principle of expected utility maximization (EUM).

Decision theorists offer two kinds of justifications for EUM. The first has to do with the law of large numbers (see section 8.1). The second has to do with the axiomatic approach (see sections 8.2 through 8.6).

8.1. The law of large numbers

The "law of large numbers," which states that in the long run, if you face the same decision problem again and again and again, and you always choose the act with the highest expected utility, then you will almost certainly be better off than if you choose any other acts.

There are two problems with using the law of large numbers to justify EUM. The first problem is that the world is ever-changing, so we rarely if ever face the same decision problem "again and again and again." The law of large numbers says that if you face the same decision problem infinitely many times, then the probability that you could do better by not maximizing expected utility approaches zero. But you won't ever face the same decision problem infinitely many times! Why should you care what would happen if a certain condition held, if you know that condition will never hold?

The second problem with using the law of large numbers to justify EUM has to do with a mathematical theorem known as gambler's ruin. Imagine that you and I flip a fair coin, and I pay you $1 every time it comes up heads and you pay me $1 every time it comes up tails. We both start with $100. If we flip the coin enough times, one of us will face a situation in which the sequence of heads or tails is longer than we can afford. If a long-enough sequence of heads comes up, I'll run out of $1 bills with which to pay you. If a long-enough sequence of tails comes up, you won't be able to pay me. So in this situation, the law of large numbers guarantees that you will be better off in the long run by maximizing expected utility only if you start the game with an infinite amount of money (so that you never go broke), which is an unrealistic assumption. (For technical convenience, assume utility increases linearly with money. But the basic point holds without this assumption.)

8.2. The axiomatic approach

The other method for justifying EUM seeks to show that EUM can be derived from axioms that hold regardless of what happens in the long run.

In this section we will review perhaps the most famous axiomatic approach, from Von Neumann and Morgenstern (1947). Other axiomatic approaches include Savage (1954), Jeffrey (1983), and Anscombe & Aumann (1963).

8.3. The Von Neumann-Morgenstern utility theorem

The first decision theory axiomatization appeared in an appendix to the second edition of Von Neumann & Morgenstern's Theory of Games and Economic Behavior (1947). An important point to note up front is that, in this axiomatization, Von Neumann and Morgenstern take the options that the agent chooses between to not be acts, as we’ve defined them, but lotteries (where a lottery is a set of outcomes, each paired with a probability). As such, while discussing their axiomatization, we will talk of lotteries. (Despite making this distinction, acts and lotteries are closely related. Under the conditions of uncertainty that we are considering here, each act will be associated with some lottery and so preferences over lotteries could be used to determine preferences over acts, if so desired).

The key feature of the Von Neumann and Morgenstern axiomatization is a proof that if a decision maker states her preferences over a set of lotteries, and if her preferences conform to a set of intuitive structural constraints (axioms), then we can construct a utility function (on an interval scale) from her preferences over lotteries and show that she acts as if she maximizes expected utility with respect to that utility function.

What are the axioms to which an agent's preferences over lotteries must conform? There are four of them.

The completeness axiom states that the agent must bother to state a preference for each pair of lotteries. That is, the agent must prefer A to B, or prefer B to A, or be indifferent between the two.
The transitivity axiom states that if the agent prefers A to B and B to C, she must also prefer A to C.
The independence axiom states that, for example, if an agent prefers an apple to an orange, then she must also prefer the lottery [55% chance she gets an apple, otherwise she gets cholera] over the lottery [55% chance she gets an orange, otherwise she gets cholera]. More generally, this axiom holds that a preference must hold independently of the possibility of another outcome (e.g. cholera).
The continuity axiom holds that if the agent prefers A to B to C, then there exists a unique p (probability) such that the agent is indifferent between [p(A) + (1 - p)(C)] and [outcome B with certainty].

The continuity axiom requires more explanation. Suppose that A = $1 million, B = $0, and C = Death. If p = 0.5, then the agent's two lotteries under consideration for the moment are:

(0.5)($1M) + (1 - 0.5)(Death) [win $1M with 50% probability, die with 50% probability]
(1)($0) [win $0 with certainty]

Most people would not be indifferent between $0 with certainty and [50% chance of $1M, 50% chance of Death] — the risk of Death is too high! But if you have continuous preferences, there is some probability p for which you'd be indifferent between these two lotteries. Perhaps p is very, very high:

(0.999999)($1M) + (1 - 0.999999)(Death) [win $1M with 99.9999% probability, die with 0.0001% probability]
(1)($0) [win $0 with certainty]

Perhaps now you'd be indifferent between lottery 1 and lottery 2. Or maybe you'd be more willing to risk Death for the chance of winning $1M, in which case the p for which you'd be indifferent between lotteries 1 and 2 is lower than 0.999999. As long as there is some p at which you'd be indifferent between lotteries 1 and 2, your preferences are "continuous."

Given this setup, Von Neumann and Morgenstern proved their theorem, which states that if the agent's preferences over lotteries obeys their axioms, then:

The agent's preferences can be represented by a utility function that assigns higher utility to preferred lotteries.
The agent acts in accordance with the principle of maximizing expected utility.
All utility functions satisfying the above two conditions are "positive linear transformations" of each other. (Without going into the details: this is why VNM-utility is measured on an interval scale.)

8.4. VNM utility theory and rationality

An agent which conforms to the VNM axioms is sometimes said to be "VNM-rational." But why should "VNM-rationality" constitute our notion of rationality in general? How could VNM's result justify the claim that a rational agent maximizes expected utility when facing a decision under uncertainty? The argument goes like this:

If an agent chooses lotteries which it prefers (in decisions under uncertainty), and if its preferences conform to the VNM axioms, then it is rational. Otherwise, it is irrational.
If an agent chooses lotteries which it prefers (in decisions under uncertainty), and if its preferences conform to the VNM axioms, then it maximizes expected utility.
Therefore, a rational agent maximizes expected utility (in decisions under uncertainty).

Von Neumann and Morgenstern proved premise 2, and the conclusion follows from premise 1 and 2. But why accept premise 1?

Few people deny that it would be irrational for an agent to choose a lottery which it does not prefer. But why is it irrational for an agent's preferences to violate the VNM axioms? I will save that discussion for section 8.6.

8.5. Objections to VNM-rationality

Several objections have been raised to Von Neumann and Morgenstern's result:

The VNM axioms are too strong. Some have argued that the VNM axioms are not self-evidently true. See section 8.6.
The VNM system offers no action guidance. A VNM-rational decision maker cannot use VNM utility theory for action guidance, because she must state her preferences over lotteries at the start. But if an agent can state her preferences over lotteries, then she already knows which lottery to choose. (For more on this, see section 9.)
In the VNM system, utility is defined via preferences over lotteries rather than preferences over outcomes. To many, it seems odd to define utility with respect to preferences over lotteries. Many would argue that utility should be defined in relation to preferences over outcomes or world-states, and that's not what the VNM system does. (Also see section 9.)

8.6. Should we accept the VNM axioms?

The VNM preference axioms define what it is for an agent to be VNM-rational. But why should we accept these axioms? Usually, it is argued that each of the axioms are pragmatically justified because an agent which violates the axioms can face situations in which they are guaranteed end up worse off (from their own perspective).

In sections 8.6.1 and 8.6.2 I go into some detail about pragmatic justifications offered for the transitivity and completeness axioms. For more detail, including arguments about the justification of the other axioms, see Peterson (2009, ch. 8) and Anand (1993).

8.6.1. The transitivity axiom

Consider the money-pump argument in favor of the transitivity axiom ("if the agent prefers A to B and B to C, she must also prefer A to C").

Imagine that a friend offers to give you exactly one of her three... novels, x or y or z... [and] that your preference ordering over the three novels is... [that] you prefer x to y, and y to z, and z to x... [That is, your preferences are cyclic, which is a type of intransitive preference relation.] Now suppose that you are in possession of z, and that you are invited to swap z for y. Since you prefer y to z, rationality obliges you to swap. So you swap, and temporarily get y. You are then invited to swap y for x, which you do, since you prefer x to y. Finally, you are offered to pay a small amount, say one cent, for swapping x for z. Since z is strictly [preferred to] x, even after you have paid the fee for swapping, rationality tells you that you should accept the offer. This means that you end up where you started, the only difference being that you now have one cent less. This procedure is thereafter iterated over and over again. After a billion cycles you have lost ten million dollars, for which you have got nothing in return. (Peterson 2009, ch. 8)

An example of a money-pump argument

Similar arguments (e.g. Gustafsson 2010) aim to show that the other kind of intransitive preferences (acyclic preferences) are irrational, too.

(Of course, pragmatic arguments need not be framed in monetary terms. We could just as well construct an argument showing that an agent with intransitive preferences can be "pumped" of all their happiness, or all their moral virtue, or all their Twinkies.)

8.6.2. The completeness axiom

The completeness axiom ("the agent must prefer A to B, or prefer B to A, or be indifferent between the two") is often attacked by saying that some goods or outcomes are incommensurable — that is, they cannot be compared. For example, must a rational agent be able to state a preference (or indifference) between money and human welfare?

Perhaps the completeness axiom can be justified with a pragmatic argument. If you think it is rationally permissible to swap between two incommensurable goods, then one can construct a money pump argument in favor of the completeness axiom. But if you think it is not rational to swap between incommensurable goods, then one cannot construct a money pump argument for the completeness axiom. (In fact, even if it is rational to swap between incommensurable goods, Mandler, 2005 has demonstrated that an agent that allows their current choices to depend on the previous ones can avoid being money pumped.)

And in fact, there is a popular argument against the completeness axiom: the "small improvement argument." For details, see Chang (1997) and Espinoza (2007).

Note that in revealed preference theory, according to which preferences are revealed through choice behavior, there is no room for incommensurable preferences because every choice always reveals a preference relation of "better than," "worse than," or "equally as good as."

Another proposal for dealing with the apparent incommensurability of some goods (such as money and human welfare) is the multi-attribute approach:

In a multi-attribute approach, each type of attribute is measured in the unit deemed to be most suitable for that attribute. Perhaps money is the right unit to use for measuring financial costs, whereas the number of lives saved is the right unit to use for measuring human welfare. The total value of an alternative is thereafter determined by aggregating the attributes, e.g. money and lives, into an overall ranking of available alternatives...

Several criteria have been proposed for choosing among alternatives with multiple attributes... [For example,] additive criteria assign weights to each attribute, and rank alternatives according to the weighted sum calculated by multiplying the weight of each attribute with its value... [But while] it is perhaps contentious to measure the utility of very different objects on a common scale, ...it seems equally contentious to assign numerical weights to attributes as suggested here....

[Now let us] consider a very general objection to multi-attribute approaches. According to this objection, there exist several equally plausible but different ways of constructing the list of attributes. Sometimes the outcome of the decision process depends on which set of attributes is chosen. (Peterson 2009, ch. 8)

For more on the multi-attribute approach, see Keeney & Raiffa (1993).

8.6.3. The Allais paradox

Having considered the transitivity and completeness axioms, we can now turn to independence (a preference holds independently of considerations of other possible outcomes). Do we have any reason to reject this axiom? Here’s one reason to think we might: in a case known as the Allais paradox Allais (1953) it may seem reasonable to act in a way that contradicts independence.

The Allais paradox asks us to consider two decisions (this version of the paradox is based on Yudkowsky (2008)).The first decision involves the choice between:

(1A) A certain $24,000; and (1B) A 33/34 chance of $27,000 and a 1/34 chance of nothing.

The second involves the choice between:

(2A) A 34% chance of $24, 000 and a 66% chance of nothing; and (2B) A 33% chance of $27, 000 and a 67% chance of nothing.

Experiments have shown that many people prefer (1A) to (1B) and (2B) to (2A). However, these preferences contradict independence. Option 2A is the same as [a 34% chance of option 1A and a 66% chance of nothing] while 2B is the same as [a 34% chance of option 1B and a 66% chance of nothing]. So independence implies that anyone that prefers (1A) to (1B) must also prefer (2A) to (2B).

When this result was first uncovered, it was presented as evidence against the independence axiom. However, while the Allais paradox clearly reveals that independence fails as a descriptive account of choice, it’s less clear what it implies about the normative account of rational choice that we are discussing in this document. As noted in Peterson (2009, ch. 4), however:

[S]ince many people who have thought very hard about this example still feel that it would be rational to stick to the problematic preference pattern described above, there seems to be something wrong with the expected utility principle.

However, Peterson then goes on to note that, many people, like the statistician Leonard Savage, argue that it is people’s preference in the Allais paradox that are in error rather than the independence axiom. If so, then the paradox seems to reveal the danger of relying too strongly on intuition to determine the form that should be taken by normative theories of rational.

8.6.4. The Ellsberg paradox

The Allais paradox is far from the only case where people fail to act in accordance with EUM. Another well-known case is the Ellsberg paradox (the following is taken from Resnik (1987):

An urn contains ninety uniformly sized balls, which are randomly distributed. Thirty of the balls are yellow, the remaining sixty are red or blue. We are not told how many red (blue) balls are in the urn – except that they number anywhere from zero to sixty. Now consider the following pair of situations. In each situation a ball will be drawn and we will be offered a bet on its color. In situation A we will choose between betting that it is yellow or that it is red. In situation B we will choose between betting that it is red or blue or that it is yellow or blue.

If we guess the correct color, we will receive a payout of $100. In the Ellsberg paradox, many people bet yellow in situation A and red or blue in situation B. Further, many people make these decisions not because they are indifferent in both situations, and so happy to choose either way, but rather because they have a strict preference to choose in this manner.

The Ellsberg paradox

However, such behavior cannot be in accordance with EUM. In order for EUM to endorse a strict preference for choosing yellow in situation A, the agent would have to assign a probability of more than 1/3 to the ball selected being blue. On the other hand, in order for EUM to endorse a strict preference for choosing red or blue in situation B the agent would have to assign a probability of less than 1/3 to the selected ball being blue. As such, these decisions can’t be jointly endorsed by an agent following EUM.

Those who deny that decisions making under ignorance can be transformed into decision making under uncertainty have an easy response to the Ellsberg paradox: as this case involves deciding under a situation of ignorance, it is irrelevant whether people’s decisions violate EUM in this case as EUM is not applicable to such situations.

Those who believe that EUM provides a suitable standard for choice in such situations, however, need to find some other way of responding to the paradox. As with the Allais paradox, there is some disagreement about how best to do so. Once again, however, many people, including Leonard Savage, argue that EUM reaches the right decision in this case. It is our intuitions that are flawed (see again Resnik (1987) for a nice summary of Savage’s argument to this conclusion).

8.6.5. The St Petersburg paradox

Another objection to the VNM approach (and to expected utility approaches generally), the St. Petersburg paradox, draws on the possibility of infinite utilities. The St. Petersburg paradox is based around a game where a fair coin is tossed until it lands heads up. At this point, the agent receives a prize worth 2ⁿ utility, where n is equal to the number of times the coin was tossed during the game. The so-called paradox occurs because the expected utility of choosing to play this game is infinite and so, according to a standard expected utility approach, the agent should be willing to pay any finite amount to play the game. However, this seems unreasonable. Instead, it seems that the agent should only be willing to pay a relatively small amount to do so. As such, it seems that the expected utility approach gets something wrong.

Various responses have been suggested. Most obviously, we could say that the paradox does not apply to VNM agents, since the VNM theorem assigns real numbers to all lotteries, and infinity is not a real number. But it's unclear whether this escapes the problem. After all, at it's core, the St. Petersburg paradox is not about infinite utilities but rather about cases where expected utility approaches seem to overvalue some choice, and such cases seem to exist even in finite cases. For example, if we let L be a finite limit on utility we could consider the following scenario (from Peterson, 2009, p. 85):

A fair coin is tossed until it lands heads up. The player thereafter receives a prize worth min {2ⁿ · 10^-100, L} units of utility, where n is the number of times the coin was tossed.

In this case, even if an extremely low value is set for L, it seems that paying this amount to play the game is unreasonable. After all, as Peterson notes, about nine times out of ten an agent that plays this game will win no more than 8 · 10^-100 utility. If paying 1 utility is, in fact, unreasonable in this case, then simply limiting an agent's utility to some finite value doesn't provide a defence of expected utility approaches. (Other problems abound. See Yudkowsky, 2007 for an interesting finite problem and Nover & Hajek, 2004 for a particularly perplexing problem with links to the St Petersburg paradox.)

As it stands, there is no agreement about precisely what the St Petersburg paradox reveals. Some people accept one of the various resolutions of the case and so find the paradox unconcerning. Others think the paradox reveals a serious problem for expected utility theories. Still others think the paradox is unresolved but don't think that we should respond by abandoning expected utility theory.

9. Does axiomatic decision theory offer any action guidance?

For the decision theories listed in section 8.2, it's often claimed the answer is "no." To explain this, I must first examine some differences between direct and indirect approaches to axiomatic decision theory.

Peterson (2009, ch. 4) explains:

In the indirect approach, which is the dominant approach, the decision maker does not prefer a risky act [or lottery] to another because the expected utility of the former exceeds that of the latter. Instead, the decision maker is asked to state a set of preferences over a set of risky acts... Then, if the set of preferences stated by the decision maker is consistent with a small number of structural constraints (axioms), it can be shown that her decisions can be described as if she were choosing what to do by assigning numerical probabilities and utilities to outcomes and then maximising expected utility...

[In contrast] the direct approach seeks to generate preferences over acts from probabilities and utilities directly assigned to outcomes. In contrast to the indirect approach, it is not assumed that the decision maker has access to a set of preferences over acts before he starts to deliberate.

The axiomatic decision theories listed in section 8.2 all follow the indirect approach. These theories, it might be said, cannot offer any action guidance because they require an agent to state its preferences over acts "up front." But an agent that states its preferences over acts already knows which act it prefers, so the decision theory can't offer any action guidance not already present in the agent's own stated preferences over acts.

Peterson (2009, ch .10) gives a practical example:

For example, a forty-year-old woman seeking advice about whether to, say, divorce her husband, is likely to get very different answers from the [two approaches]. The [indirect approach] will advise the woman to first figure out what her preferences are over a very large set of risky acts, including the one she is thinking about performing, and then just make sure that all preferences are consistent with certain structural requirements. Then, as long as none of the structural requirements is violated, the woman is free to do whatever she likes, no matter what her beliefs and desires actually are... The [direct approach] will [instead] advise the woman to first assign numerical utilities and probabilities to her desires and beliefs, and then aggregate them into a decision by applying the principle of maximizing expected utility.

Thus, it seems only the direct approach offers an agent any action guidance. But the direct approach is very recent (Peterson 2008; Cozic 2011), and only time will show whether it can stand up to professional criticism.

Warning: Peterson's (2008) direct approach is confusingly called "non-Bayesian decision theory" despite assuming Bayesian probability theory.

For other attempts to pull action guidance from normative decision theory, see Fallenstein (2012) and Stiennon (2013).

10. How does probability theory play a role in decision theory?

In order to calculate the expected utility of an act (or lottery), it is necessary to determine a probability for each outcome. In this section, I will explore some of the details of probability theory and its relationship to decision theory.

For further introductory material to probability theory, see Howson & Urbach (2005), Grimmet & Stirzacker (2001), and Koller & Friedman (2009). This section draws heavily on Peterson (2009, chs. 6 & 7) which provides a very clear introduction to probability in the context of decision theory.

10.1. The basics of probability theory

Intuitively, a probability is a number between 0 or 1 that labels how likely an event is to occur. If an event has probability 0 then it is impossible and if it has probability 1 then it can't possibly be false. If an event has a probability between these values, then this event it is more probable the higher this number is.

As with EUM, probability theory can be derived from a small number of simple axioms. In the probability case, there are three of these, which are named the Kolmogorov axioms after the mathematician Andrey Kolmogorov. The first of these states that probabilities are real numbers between 0 and 1. The second, that if a set of events are mutually exclusive and exhaustive then their probabilities should sum to 1. The third that if two events are mutually exclusive then the probability that one or the other of these events will occur is equal to the sum of their individual probabilities.

From these three axioms, the remainder of probability theory can be derived. In the remainder of this section, I will explore some aspects of this broader theory.

10.2. Bayes theorem for updating probabilities

From the perspective of decision theory, one particularly important aspect of probability theory is the idea of a conditional probability. These represent how probable something is given a piece of information. So, for example, a conditional probability could represent how likely it is that it will be raining, conditioning on the fact that the weather forecaster predicted rain. A powerful technique for calculating conditional probabilities is Bayes theorem (see Yudkowsky, 2003 for a detailed introduction). This formula states that:

P(A|B)=(P(B|A)P(A))/P(B)

Bayes theorem is used to calculate the probability of some event, A, given some evidence, B. As such, this formula can be used to update probabilities based on new evidence. So if you are trying to predict the probability that it will rain tomorrow and someone gives you the information that the weather forecaster predicted that it will do so then this formula tells you how to calculate a new probability that it will rain based on your existing information. The initial probability in such cases (before the information is factored into account) is called the prior probability and the result of applying Bayes theorem is a new, posterior probability.

Using Bayes theorem to update probabilities based on the evidence provided by a weather forecast

Bayes theorem can be seen as solving the problem of how to update prior probabilities based on new information. However, it leaves open the question of how to determine the prior probability in the first place. In some cases, there will be no obvious way to do so. One solution to this problem suggests that any reasonable prior can be selected. Given enough evidence, repeated applications of Bayes theorem will lead this prior probability to be updated to much the same posterior probability, even for people with widely different initial priors. As such, the initially selected prior is less crucial than it may at first seem.

10.3. How should probabilities be interpreted?

There are two main views about what probabilities mean: objectivism and subjectivism. Loosely speaking, the objectivist holds that probabilities tell us something about the external world while the subjectivist holds that they tell us something about our beliefs. Most decision theorists hold a subjectivist view about probability. According to this sort of view, probabilities represent a subjective degrees of belief. So to say the probability of rain is 0.8 is to say that the agent under consideration has a high degree of belief that it will rain (see Jaynes, 2003 for a defense of this view). Note that, according to this view, another agent in the same circumstance could assign a different probability that it will rain.

10.3.1. Why should degrees of belief follow the laws of probability?

One question that might be raised against the subjective account of probability is why, on this account, our degrees of belief should satisfy the Kolmogorov axioms. For example, why should our subjective degrees of belief in mutually exclusive, exhaustive events add to 1? One answer to this question shows that agents whose degrees of belief don’t satisfy these axioms will be subject to Dutch Book bets. These are bets where the agent will inevitably lose money. Peterson (2009, ch. 7) explains:

Suppose, for instance, that you believe to degree 0.55 that at least one person from India will win a gold medal in the next Olympic Games (event G), and that your subjective degree of belief is 0.52 that no Indian will win a gold medal in the next Olympic Games (event ¬G). Also suppose that a cunning bookie offers you a bet on both of these events. The bookie promises to pay you $1 for each event that actually takes place. Now, since your subjective degree of belief that G will occur is 0.55 it would be rational to pay up to $1·0.55 = $0.55 for entering this bet. Furthermore, since your degree of belief in ¬G is 0.52 you should be willing to pay up to $0.52 for entering the second bet, since $1·0.52 = $0.52. However, by now you have paid $1.07 for taking on two bets that are certain to give you a payoff of $1 no matter what happens...Certainly, this must be irrational. Furthermore, the reason why this is irrational is that your subjective degrees of belief violate the probability calculus.

A Dutch Book argument

It can be proven that an agent is subject to Dutch Book bets if, and only if, their degrees of belief violate the axioms of probability. This provides an argument for why degrees of beliefs should satisfy these axioms.

10.3.2. Measuring subjective probabilities

Another challenges raised by the subjective view is how we can measure probabilities. If these represent subjective degrees of belief there doesn’t seem to be an easy way to determine these based on observations of the world. However, a number of responses to this problem have been advanced, one of which is explained succinctly by Peterson (2009, ch. 7):

The main innovations presented by... Savage can be characterised as systematic procedures for linking probability... to claims about objectively observable behavior, such as preference revealed in choice behavior. Imagine, for instance, that we wish to measure Caroline's subjective probability that the coin she is holding in her hand will land heads up the next time it is tossed. First, we ask her which of the following very generous options she would prefer.

A: "If the coin lands heads up you win a sports car; otherwise you win nothing."

B: "If the coin does not land heads up you win a sports car; otherwise you win nothing."

Suppose Caroline prefers A to B. We can then safely conclude that she thinks it is more probable that the coin will land heads up rather than not. This follows from the assumption that Caroline prefers to win a sports car rather than nothing, and that her preference between uncertain prospects is entirely determined by her beliefs and desires with respect to her prospects of winning the sports car...

Next, we need to generalise the measurement procedure outlined above such that it allows us to always represent Caroline's degrees of belief with precise numerical probabilities. To do this, we need to ask Caroline to state preferences over a much larger set of options and then reason backwards... Suppose, for instance, that Caroline wishes to measure her subjective probability that her car worth $20,000 will be stolen within one year. If she considers $1,000 to be... the highest price she is prepared to pay for a gamble in which she gets $20,000 if the event S: "The car stolen within a year" takes place, and nothing otherwise, then Caroline's subjective probability for S is 1,000/20,000 = 0.05, given that she forms her preferences in accordance with the principle of maximising expected monetary value...

The problem with this method is that very few people form their preferences in accordance with the principle of maximising expected monetary value. Most people have a decreasing marginal utility for money...

Fortunately, there is a clever solution to [this problem]. The basic idea is to impose a number of structural conditions on preferences over uncertain options [e.g. the transitivity axiom]. Then, the subjective probability function is established by reasoning backwards while taking the structural axioms into account: Since the decision maker preferrred some uncertain options to others, and her preferences... satisfy a number of structure axioms, the decision maker behaves as if she were forming her preferences over uncertain options by first assigning subjective probabilities and utilities to each option and thereafter maximising expected utility.

A peculiar feature of this approach is, thus, that probabilities (and utilities) are derived from 'within' the theory. The decision maker does not prefer an uncertain option to another because she judges the subjective probabilities (and utilities) of the outcomes to be more favourable than those of another. Instead, the... structure of the decision maker's preferences over uncertain options logically implies that they can be described as if her choices were governed by a subjective probability function and a utility function...

...Savage's approach [seeks] to explicate subjective interpretations of the probability axioms by making certain claims about preferences over... uncertain options. But... why on earth should a theory of subjective probability involve assumptions about preferences, given that preferences and beliefs are separate entities? Contrary to what is claimed by [Savage and others], emotionally inert decision makers failing to muster any preferences at all... could certainly hold partial beliefs.

Other theorists, for example DeGroot (1970), propose other approaches:

DeGroot's basic assumption is that decision makers can make qualitative comparisons between pairs of events, and judge which one they think is most likely to occur. For example, he assumes that one can judge whether it is more, less, or equally likely, according to one's own beliefs, that it will rain today in Cambridge than in Cairo. DeGroot then shows that if the agent's qualitative judgments are sufficiently fine-grained and satisfy a number of structural axioms, then [they can be described by a probability distribution]. So in DeGroot's... theory, the probability function is obtained by fine-tuning qualitative data, thereby making them quantitative.

11. What about "Newcomb's problem" and alternative decision algorithms?

Saying that a rational agent "maximizes expected utility" is, unfortunately, not specific enough. There are a variety of decision algorithms which aim to maximize expected utility, and they give different answers to some decision problems, for example "Newcomb's problem."

In this section, we explain these decision algorithms and show how they perform on Newcomb's problem and related "Newcomblike" problems.

General sources on this topic include: Campbell & Sowden (1985), Ledwig (2000), Joyce (1999), and Yudkowsky (2010). Moertelmaier (2013) discusses Newcomblike problems in the context of the agent-environment framework.

11.1. Newcomblike problems and two decision algorithms

I'll begin with an exposition of several Newcomblike problems, so that I can refer to them in later sections. I'll also introduce our first two decision algorithms, so that I can show how one's choice of decision algorithm affects an agent's outcomes on these problems.

11.1.1. Newcomb's Problem

Newcomb's problem was formulated by the physicist William Newcomb but first published in Nozick (1969). Below I present a version of it inspired by Yudkowsky (2010).

A superintelligent machine named Omega visits Earth from another galaxy and shows itself to be very good at predicting events. This isn't because it has magical powers, but because it knows more science than we do, has billions of sensors scattered around the globe, and runs efficient algorithms for modeling humans and other complex systems with unprecedented precision — on an array of computer hardware the size of our moon.

Omega presents you with two boxes. Box A is transparent and contains $1000. Box B is opaque and contains either $1 million or nothing. You may choose to take both boxes (called "two-boxing"), or you may choose to take only box B (called "one-boxing"). If Omega predicted you'll two-box, then Omega has left box B empty. If Omega predicted you'll one-box, then Omega has placed $1M in box B.

By the time you choose, Omega has already left for its next game — the contents of box B won't change after you make your decision. Moreover, you've watched Omega play a thousand games against people like you, and on every occasion Omega predicted the human player's choice accurately.

Should you one-box or two-box?

Newcomb’s problem

Here's an argument for two-boxing. The $1M either is or is not in the box; your choice cannot affect the contents of box B now. So, you should two-box, because then you get $1K plus whatever is in box B. This is a straightforward application of the dominance principle (section 6.1). Two-boxing dominantes one-boxing.

Convinced? Well, here's an argument for one-boxing. On all those earlier games you watched, everyone who two-boxed received $1K, and everyone who one-boxed received $1M. So you're almost certain that you'll get $1K for two-boxing and $1M for one-boxing, which means that to maximize your expected utility, you should one-box.

Nozick (1969) reports:

I have put this problem to a large number of people... To almost everyone it is perfectly clear and obvious what should be done. The difficulty is that these people seem to divide almost evenly on the problem, with large numbers thinking that the opposing half is just being silly.

This is not a "merely verbal" dispute (Chalmers 2011). Decision theorists have offered different algorithms for making a choice, and they have different outcomes. Translated into English, the first algorithm (evidential decision theory or EDT) says "Take actions such that you would be glad to receive the news that you had taken them." The second algorithm (causal decision theory or CDT) says "Take actions which you expect to have a positive effect on the world."

Many decision theorists have the intuition that CDT is right. But a CDT agent appears to "lose" on Newcomb's problem, ending up with $1000, while an EDT agent gains $1M. Proponents of EDT can ask proponents of CDT: "If you're so smart, why aren't you rich?" As Spohn (2012) writes, "this must be poor rationality that complains about the reward for irrationality." Or as Yudkowsky (2010) argues:

An expected utility maximizer should maximize utility — not formality, reasonableness, or defensibility...

In response to EDT's apparent "win" over CDT on Newcomb's problem, proponents of CDT have presented similar problems on which a CDT agent "wins" and an EDT agent "loses." Proponents of EDT, meanwhile, have replied with additional Newcomblike problems on which EDT wins and CDT loses. Let's explore each of them in turn.

11.1.2. Evidential and causal decision theory

First, however, we will consider our two decision algorithms in a little more detail.

EDT can be described simply: according to this theory, agents should use conditional probabilities when determining the expected utility of different acts. Specifically, they should use the probability of the world being in each possible state conditioning on them carrying out the act under consideration. So in Newcomb’s problem they consider the probability that Box B contains $1 million or nothing conditioning on the evidence provided by their decision to one-box or two-box. This is how the theory formalizes the notion of an act providing good news.

CDT is more complex, at least in part because it has been formulated in a variety of different ways and these formulations are equivalent to one another only if certain background assumptions are met. However, a good sense of the theory can be gained by considering the counterfactual approach, which is one of the more intuitive of these formulations. This approach utilizes the probabilities of certain counterfactual conditionals, which can be thought of as representing the causal influence of an agent’s acts on the state of the world. These conditionals take the form “if I were to carry out a certain act, then the world would be in a certain state." So in Newcomb’s problem, for example, this formulation of CDT considers the probability of the counterfactuals like “if I were to one-box, then Box B would contain $1 million” and, in doing so, considers the causal influence of one-boxing on the contents of the boxes.

The same distinction can be made in formulaic terms. Both EDT and CDT agree that decision theory should be about maximizing expected utility where the expected utility of an act, A, given a set of possible outcomes, O, is defined as follows:

In this equation, V(A & O) represents the value to the agent of the combination of an act and an outcome. So this is the utility that the agent will receive if they carry out a certain act and a certain outcome occurs. Further, Pr_AO represents the probability of each outcome occurring on the supposition that the agent carries out a certain act. It is in terms of this probability that CDT and EDT differ. EDT uses the conditional probability, Pr(O|A), while CDT uses the probability of subjunctive conditionals, Pr(A O).

Using these two versions of the expected utility formula, it's possible to demonstrate in a formal manner why EDT and CDT give the advice they do in Newcomb's problem. To demonstrate this it will help to make two simplifying assumptions. First, we will presume that each dollar of money is worth 1 unit of utility to the agent (and so will presume that the agent's utility is linear with money). Second, we will presume that Omega is a perfect predictor of human actions so that if the agent two-boxes it provides definitive evidence that there is nothing in the opaque box and if the agent one-boxes it provides definitive evidence that there is $1 million in this box. Given these assumptions, EDT calculates the expected utility of each decision as follows:

EU for two-boxing according to EDT

EU for one-boxing according to EDT

Given that one-boxing has a higher expected utility according to these calculations, an EDT agent will one-box.

On the other hand, given that the agent's decision doesn't causally influence Omega's earlier prediction, CDT will use the same probability regardless of whether you one or two box. The decision endorsed will be the same regardless of what probability we use so, to demonstrate the theory, we can simply arbitrarily assign an 0.5 probability that the opaque box has nothing in it and an 0.5 probability that it has one million dollars in it. CDT then calculates the expected utility of each decision as follows:

EU for two-boxing according to CDT

EU for one-boxing according to CDT

Given that two-boxing has a higher expected utility according to these calculations, a CDT agent will two-box. This approach demonstrates the result given more informally in the previous section: CDT agents will two-box in Newcomb's problem and EDT agents will one box.

As mentioned before, there are also alternative formulations of CDT. What are these? For example, David Lewis (1981) and Brian Skyrms (1980) both present approaches that rely on the partition of the world into states to capture causal information, rather than counterfactual conditionals. On Lewis’s version of this account, for example, the agent calculates the expected utility of acts using their unconditional credence in states of the world that are dependency hypotheses, which are descriptions of the possible ways that the world can depend on the agent’s actions. These dependency hypotheses intrinsically contain the required causal information.

Other traditional approaches to CDT include the imaging approach of Sobel (1980) (also see Lewis 1981) and the unconditional expectations approach of Leonard Savage (1954). Those interested in the various traditional approaches to CDT would be best to consult Lewis (1981), Weirich (2008), and Joyce (1999). More recently, work in computer science on a tool called causal Bayesian networks has led to an innovative approach to CDT that has received some recent attention in the philosophical literature (Pearl 2000, ch. 4 and Spohn 2012).

Now we return to an analysis of decision scenarios, armed with EDT and the counterfactual formulation of CDT.

11.1.3. Medical Newcomb problems

Medical Newcomb problems share a similar form but come in many variants, including Solomon's problem (Gibbard & Harper 1976) and the smoking lesion problem (Egan 2007). Below I present a variant called the "chewing gum problem" (Yudkowsky 2010):

Suppose that a recently published medical study shows that chewing gum seems to cause throat abscesses — an outcome-tracking study showed that of people who chew gum, 90% died of throat abscesses before the age of 50. Meanwhile, of people who do not chew gum, only 10% die of throat abscesses before the age of 50. The researchers, to explain their results, wonder if saliva sliding down the throat wears away cellular defenses against bacteria. Having read this study, would you choose to chew gum? But now a second study comes out, which shows that most gum-chewers have a certain gene, CGTA, and the researchers produce a table showing the following mortality rates:

CGTA present CGTA absent

Chew Gum 89% die 8% die

Don’t chew 99% die 11% die

This table shows that whether you have the gene CGTA or not, your chance of dying of a throat abscess goes down if you chew gum. Why are fatalities so much higher for gum-chewers, then? Because people with the gene CGTA tend to chew gum and die of throat abscesses. The authors of the second study also present a test-tube experiment which shows that the saliva from chewing gum can kill the bacteria that form throat abscesses. The researchers hypothesize that because people with the gene CGTA are highly susceptible to throat abscesses, natural selection has produced in them a tendency to chew gum, which protects against throat abscesses. The strong correlation between chewing gum and throat abscesses is not because chewing gum causes throat abscesses, but because a third factor, CGTA, leads to chewing gum and throat abscesses.

Having learned of this new study, would you choose to chew gum? Chewing gum helps protect against throat abscesses whether or not you have the gene CGTA. Yet a friend who heard that you had decided to chew gum (as people with the gene CGTA often do) would be quite alarmed to hear the news — just as she would be saddened by the news that you had chosen to take both boxes in Newcomb’s Problem. This is a case where [EDT] seems to return the wrong answer, calling into question the validity of the... rule “Take actions such that you would be glad to receive the news that you had taken them.” Although the news that someone has decided to chew gum is alarming, medical studies nonetheless show that chewing gum protects against throat abscesses. [CDT's] rule of “Take actions which you expect to have a positive physical effect on the world” seems to serve us better.

One response to this claim, called the tickle defense (Eells, 1981), argues that EDT actually reaches the right decision in such cases. According to this defense, the most reasonable way to construe the “chewing gum problem” involves presuming that CGTA causes a desire (a mental “tickle”) which then causes the agent to be more likely to chew gum, rather than CGTA directly causing the action. Given this, if we presume that the agent already knows their own desires and hence already knows whether they’re likely to have the CGTA gene, chewing gum will not provide the agent with further bad news. Consequently, an agent following EDT will chew in order to get the good news that they have decreased their chance of getting abscesses.

Unfortunately, the tickle defense fails to achieve its aims. In introducing this approach, Eells hoped that EDT could be made to mimic CDT but without an allegedly inelegant reliance on causation. However, Sobel (1994, ch. 2) demonstrated that the tickle defense failed to ensure that EDT and CDT would decide equivalently in all cases. On the other hand, those who feel that EDT originally got it right by one-boxing in Newcomb’s problem will be disappointed to discover that the tickle defense leads an agent to two-box in some versions of Newcomb’s problem and so solves one problem for the theory at the expense of introducing another.

So just as CDT “loses” on Newcomb’s problem, EDT will "lose” on Medical Newcomb problems (if the tickle defense fails) or will join CDT and "lose" on Newcomb’s Problem itself (if the tickle defense succeeds).

11.1.4. Newcomb's soda

There are also similar problematic cases for EDT where the evidence provided by your decision relates not to a feature that you were born (or created) with but to some other feature of the world. One such scenario is the Newcomb’s soda problem, introduced in Yudkowsky (2010):

You know that you will shortly be administered one of two sodas in a double-blind clinical test. After drinking your assigned soda, you will enter a room in which you find a chocolate ice cream and a vanilla ice cream. The first soda produces a strong but entirely subconscious desire for chocolate ice cream, and the second soda produces a strong subconscious desire for vanilla ice cream. By “subconscious” I mean that you have no introspective access to the change, any more than you can answer questions about individual neurons firing in your cerebral cortex. You can only infer your changed tastes by observing which kind of ice cream you pick.

It so happens that all participants in the study who test the Chocolate Soda are rewarded with a million dollars after the study is over, while participants in the study who test the Vanilla Soda receive nothing. But subjects who actually eat vanilla ice cream receive an additional thousand dollars, while subjects who actually eat chocolate ice cream receive no additional payment. You can choose one and only one ice cream to eat. A pseudo-random algorithm assigns sodas to experimental subjects, who are evenly divided (50/50) between Chocolate and Vanilla Sodas. You are told that 90% of previous research subjects who chose chocolate ice cream did in fact drink the Chocolate Soda, while 90% of previous research subjects who chose vanilla ice cream did in fact drink the Vanilla Soda. Which ice cream would you eat?

Newcomb’s soda

In this case, an EDT agent will decide to eat chocolate ice cream as this would provide evidence that they drank the chocolate soda and hence that they will receive $1 million after the experiment. However, this seems to be the wrong decision and so, once again, the EDT agent “loses”.

11.1.5. Bostrom's meta-Newcomb problem

In response to attacks on their theory, the proponent of EDT can present alternative scenarios where EDT “wins” and it is CDT that “loses”. One such case is the meta-Newcomb problem proposed in Bostrom (2001). Adapted to fit my earlier story about Omega the superintelligent machine (section 11.1.1), the problem runs like this: Either Omega has already placed $1M or nothing in box B (depending on its prediction about your choice), or else Omega is watching as you choose and after your choice it will place $1M into box B only if you have one-boxed. But you don't know which is the case. Omega makes its move before the human player's choice about half the time, and the rest of the time it makes its move after the player's choice.

But now suppose there is another superintelligent machine, Meta-Omega, who has a perfect track record of predicting both Omega's choices and the choices of human players. Meta-Omega tells you that either you will two-box and Omega will "make its move" after you make your choice, or else you will one-box and Omega has already made its move (and gone on to the next game, with someone else).

Here, an EDT agent one-boxes and walks away with a million dollars. On the face of it, however, a CDT agent faces a dilemma: if she two-boxes then Omega's action depends on her choice, so the "rational" choice is to one-box. But if the CDT agent one-boxes, then Omega's action temporally precedes (and is thus physically independent of) her choice, so the "rational" action is to two-box. It might seem, then, that a CDT agent will be unable to reach any decision in this scenario. However, further reflection reveals that the issue is more complicated. According to CDT, what the agent ought to do in this scenario depends on their credences about their own actions. If they have a high credence that they will two-box, they ought to one-box and if they have a high credence that they will one-box, they ought to two box. Given that the agent's credences in their actions are not given to us in the description of the meta-Newcomb problem, the scenario is underspecified and it is hard to know what conclusions should be drawn from it.

11.1.6. The psychopath button

Fortunately, another case has been introduced where, according to CDT, what an agent ought to do depends on their credences about what they will do. This is the psychopath button, introduced in Egan (2007):

Paul is debating whether to press the “kill all psychopaths” button. It would, he thinks, be much better to live in a world with no psychopaths. Unfortunately, Paul is quite confident that only a psychopath would press such a button. Paul very strongly prefers living in a world with psychopaths to dying. Should Paul press the button?

Many people think Paul should not. After all, if he does so, he is almost certainly a psychopath and so pressing the button will almost certainly cause his death. This is also the response that an EDT agent will give. After all, pushing the button would provide the agent with the bad news that they are almost certainly a psychopath and so will die as a result of their action.

On the other hand, if Paul is fairly certain that he is not a psychopath, then CDT will say that he ought to press the button. CDT will note that, given Paul’s confidence that he isn’t a psychopath, his decision will almost certainly have a positive impact as it will result in the death of all psychopaths and Paul’s survival. On the face of it, then, a CDT agent would decide inappropriately in this case by pushing the button. Importantly, unlike in the meta-Newcomb problem, the agent's credences about their own behavior are specified in Egan's full version of this scenario (in non-numeric terms, the agent thinks they're unlikely to be a psychopath and hence unlikely to press the button).

However, in order to produce this problem for CDT, Egan made a number of assumptions about how an agent should decide when what they ought to do depends on what they think they will do. In response, alternative views about deciding in such cases have been advanced (particular in Arntzenius, 2008 and Joyce, 2012). Given these factors, opinions are split about whether the psychopath button problem does in fact pose a challenge to CDT.

11.1.7. Parfit's hitchhiker

Not all decision scenarios are problematic for just one of EDT or CDT. There are also cases that can be presented where both an EDT agent and a CDT agent will both "lose". One such case is Parfit’s Hitchhiker (Parfit, 1984, p. 7):

Suppose that I am driving at midnight through some desert. My car breaks down. You are a stranger, and the only other driver near. I manage to stop you, and I offer you a great reward if you rescue me. I cannot reward you now, but I promise to do so when we reach my home. Suppose next that I am transparent, unable to deceive others. I cannot lie convincingly. Either a blush, or my tone of voice, always gives me away. Suppose, finally, that I know myself to be never self-denying. If you drive me to my home, it would be worse for me if I gave you the promised reward. Since I know that I never do what will be worse for me, I know that I shall break my promise. Given my inability to lie convincingly, you know this too. You do not believe my promise, and therefore leave me stranded in the desert.

In this scenario the agent "loses" if they would later refuse to give the stranger the reward. However, both EDT agents and CDT agents will refuse to do so. After all, by this point the agent will already be safe so giving the reward can neither provide good news about, nor cause, their safety. So this seems to be a case where both theories “lose”.

11.1.8. Transparent Newcomb's problem

There are also other cases where both EDT and CDT "lose". One of these is the Transparent Newcomb's problem which, in at least one version, is due to Drescher (2006, p. 238-242). This scenario is like the original Newcomb's problem but, in this case, both boxes are transparent so you can see their contents when you make your decision. Again, Omega has filled box A with $1000 and Box B with either $1 million or nothing based on a prediction of your behavior. Specifically, Omega has predicted how you would decide if you witnessed $1 million in Box B. If Omega predicted that you would one-box in this case, he placed $1 million in Box B. On the other hand, if Omega predicted that you would two-box in this case then he placed nothing in Box B.

Both EDT and CDT agents will two-box in this case. After all, the contents of the boxes are determined and known so the agent's decision can neither provide good news about what they contain nor cause them to contain something desirable. As with two-boxing in the original version of Newcomb’s problem, many philosophers will endorse this behavior.

However, it’s worth noting that Omega will almost certainly have predicted this decision and so filled Box B with nothing. CDT and EDT agents will end up with $1000. On the other hand, just as in the original case, the agent that one-boxes will end up with $1 million. So this is another case where both EDT and CDT “lose”. Consequently, to those that agree with the earlier comments (in section 11.1.1) that a decision theory shouldn't lead an agent to "lose", neither of these theories will be satisfactory.

11.1.9. Counterfactual mugging

Another similar case, known as counterfactual mugging, was developed in Nesov (2009):

Imagine that one day, Omega comes to you and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don't want to give up your $100. But see, the Omega tells you that if the coin came up heads instead of tails, it'd give you $10000, but only if you'd agree to give it $100 if the coin came up tails.

Should you give up the $100?

Both CDT and EDT say no. After all, giving up your money neither provides good news about nor influences your chances of getting $10 000 out of the exchange. Further, this intuitively seems like the right decision. On the face of it, then, it is appropriate to retain your money in this case.

However, presuming you take Omega to be perfectly trustworthy, there seems to be room to debate this conclusion. If you are the sort of agent that gives up the $100 in counterfactual mugging then you will tend to do better than the sort of agent that won’t give up the $100. Of course, in the particular case at hand you will lose but rational agents often lose in specific cases (as, for example, when such an agent loses a rational bet). It could be argued that what a rational agent should not do is be the type of agent that loses. Given that agents that refuse to give up the $100 are the type of agent that loses, there seem to be grounds to claim that counterfactual mugging is another case where both CDT and EDT act inappropriately.

11.1.10. Prisoner's dilemma

Before moving on to a more detailed discussion of various possible decision theories, I’ll consider one final scenario: the prisoner’s dilemma. Resnik (1987, pp. 147-148 ) outlines this scenario as follows:

Two prisoners...have been arrested for vandalism and have been isolated from each other. There is sufficient evidence to convict them on the charge for which they have been arrested, but the prosecutor is after bigger game. He thinks that they robbed a bank together and that he can get them to confess to it. He summons each separately to an interrogation room and speaks to each as follows: "I am going to offer the same deal to your partner, and I will give you each an hour to think it over before I call you back. This is it: If one of you confesses to the bank robbery and the other does not, I will see to it that the confessor gets a one-year term and that the other guy gets a twenty-five year term. If you both confess, then it's ten years apiece. If neither of you confesses, then I can only get two years apiece on the vandalism charge..."

The decision matrix of each vandal will be as follows:

	Partner confesses	Partner lies
Confess	10 years in jail	1 year in jail
Lie	25 years in jail	2 years in jail

Faced with this scenario, a CDT agent will confess. After all, the agent’s decision can’t influence their partner’s decision (they’ve been isolated from one another) and so the agent is better off confessing regardless of what their partner chooses to do. According to the majority of decision (and game) theorists, confessing is in fact the rational decision in this case.

Despite this, however, an EDT agent may lie in a prisoner’s dilemma. Specifically, if they think that their partner is similar enough to them, the agent will lie because doing so will provide the good news that they will both lie and hence that they will both get two years in jail (good news as compared with the bad news that they will both confess and hence that they will get 10 years in jail).

To many people, there seems to be something compelling about this line of reasoning. For example, Douglas Hofstadter (1985, pp. 737-780) has argued that an agent acting “superrationally” would co-operate with other superrational agents for precisely this sort of reason: a superrational agent would take into account the fact that other such agents will go through the same thought process in the prisoner’s dilemma and so make the same decision. As such, it is better that that the decision that both agents reach be to lie than that it be to confess. More broadly, it could perhaps be argued that a rational agent should lie in the prisoner’s dilemma as long as they believe that they are similar enough to their partner that they are likely to reach the same decision.

An argument for cooperation in the prisoners’ dilemma

It is unclear, then, precisely what should be concluded from the prisoner’s dilemma. However, for those that are sympathetic to Hofstadter’s point or the line of reasoning appealed to by the EDT agent, the scenario seems to provide an additional reason to seek out an alternative theory to CDT.

11.2. Benchmark theory (BT)

One recent response to the apparent failure of EDT to decide appropriately in medical Newcomb problems and CDT to decide appropriately in the psychopath button is Benchmark Theory (BT) which was developed in Wedgwood (2011) and discussed further in Briggs (2010).

In English, we could think of this decision algorithm as saying that agents should decide so as to give their future self good news about how well off they are compared to how well off they could have been. In formal terms, BT uses the following formula to calculate the expected utility of an act, A:

In other words, it uses the conditional probability, as in EDT but calculates the value differently (as indicated by the use of V’ rather than V). V’ is calculated relative to a benchmark value in order to give a comparative measure of value (both of the above sources go into more detail about this process).

Taking the informal perspective, in the chewing gum problem, BT will note that by chewing gum, the agent will always get the good news that they are comparatively better off than they could have been (because chewing gum helps control throat abscesses) whereas by not chewing, the agent will always get the bad news that they could have been comparatively better off by chewing. As such, a BT agent will chew in this scenario.

Further, BT seems to reach what many consider to be the right decision in the psychopath button. In this case, the BT agent will note that if they push the button they will get the bad news that they are almost certainly a psychopath and so that they would have been comparatively much better off by not pushing (as pushing will kill them). On the other hand, if they don’t push they will get the less bad news that they are almost certainly not a psychopath and so could have been comparatively a little better off it they had pushed the button (as this would have killed all the psychopaths but not them). So refraining from pushing the button gives the less bad news and so is the rational decision.

On the face of it, then, there seem to be strong reasons to find BT compelling: it decides appropriately in these scenarios while, according to some people, EDT and CDT only decide appropriately in one or the other of them.

Unfortunately, a BT agent will fail to decide appropriately in other scenarios. First, those that hold that one-boxing is the appropriate decision in Newcomb’s problem will immediately find a flaw in BT. After all, in this scenario two-boxing gives the good news that the agent did comparatively better than they could have done (because they gain the $1000 from Box A which is more than they would have received otherwise) while one-boxing brings the bad news that they did comparatively worse than they could have done (as they did not receive this money). As such, a BT agent will two-box in Newcomb’s problem.

Further, Briggs (2010) argues, though Wedgwood (2011) denies, that BT suffers from other problems. As such, even for those who support two-boxing in Newcomb’s problem, it could be argued that BT doesn’t represent an adequate theory of choice. It is unclear, then, whether BT is a desirable replacement to alternative theories.

11.3. Timeless decision theory (TDT)

Yudkowsky (2010) offers another decision algorithm, timeless decision theory or TDT (see also Altair, 2013). Specifically, TDT is intended as an explicit response to the idea that a theory of rational choice should lead an agent to “win”. As such, it will appeal to those who think it is appropriate to one-box in Newcomb’s problem and chew in the chewing gum problem.

In English, this algorithm can be approximated as saying that an agent ought to choose as if CDT were right but they were determining not their actual decision but rather the result of the abstract computation of which their decision is one concrete instance. Formalizing this decision algorithm would require a substantial document in its own right and so will not be carried out in full here. Briefly, however, TDT is built on top of causal Bayesian networks (Pearl, 2000) which are graphs where the arrows represent causal influence. TDT supplements these graphs by adding nodes representing abstract computations and taking the abstract computation that determines an agent’s decision to be the object of choice rather than the concrete decision itself (see Yudkowsky, 2010 for a more detailed description).

Returning to an informal discussion, an example will help clarify the form taken by TDT: imagine that two perfect replicas of a person are placed in identical rooms and asked to make the same decision. While each replica will make their own decision, in doing so, they will be carrying out the same computational process. As such, TDT will say that the replicas ought to act as if they are determining the result of this process and hence as if they are deciding the behavior of both copies.

Something similar can be said about Newcomb’s problem. In this case it is almost like there is again a replica of the agent: Omega’s model of the agent that it used to predict the agent’s behavior. Both the original agent and this “replica” responds to the same abstract computational process as one another. In other words, both Omega’s prediction and the agent’s behavior are influenced by this process. As such, TDT advises the agent to act as if they are determining the result of this process and, hence, as if they can determine Omega’s box filling behavior. As such, a TDT agent will one-box in order to determine the result of this abstract computation in a way that leads to $1 million being placed in Box B.

TDT also succeeds in other areas. For example, in the chewing gum problem there is no “replica” agent so TDT will decide in line with standard CDT and choose to chew gum. Further, in the prisoner’s dilemma, a TDT agent will lie if its partner is another TDT agent (or a relevantly similar agent). After all, in this case both agents will carry out the same computational process and so TDT will advise that the agent act as if they are determining this process and hence simultaneously determining both their own and their partner’s decision. If so then it is better for the agent that both of them lie than that both of them confess.

However, despite its success, TDT also “loses” in some decision scenarios. For example, in counterfactual mugging, a TDT agent will not choose to give up the $100. This might seem surprising. After all, as with Newcomb’s problem, this case involves Omega predicting the agent’s behavior and hence involves a “replica”. However, this case differs in that the agent knows that the coin came up heads and so knows that they have nothing to gain by giving up the money.

For those who feel that a theory of rational choice should lead an agent to “win”, then, TDT seems like a step in the right direction but further work is required if it is to “win” in the full range of decision scenarios.

11.4. Decision theory and “winning”

In the previous section, I discussed TDT, a decision algorithm that could be advanced as replacements for CDT and EDT. One of the primary motivations for developing TDT is a sense that both CDT and EDT fail to reason in a desirable manner in some decision scenarios. However, despite acknowledging that CDT agents end up worse off in Newcomb's Problem, many (and perhaps the majority of) decision theorists are proponents of CDT. On the face of it, this may seem to suggest that these decision theorists aren't interested in developing a decision algorithm that "wins" but rather have some other aim in mind. If so then this might lead us to question the value of developing one-boxing decision algorithms.

However, the claim that most decision theorists don’t care about finding an algorithm that “wins” mischaracterizes their position. After all, proponents of CDT tend to take the challenge posed by the fact that CDT agents “lose” in Newcomb's problem seriously (in the philosophical literature, it's often referred to as the Why ain'cha rich? problem). A common reaction to this challenge is neatly summarized in Joyce (1999, p. 153-154 ) as a response to a hypothetical question about why, if two-boxing is rational, the CDT agent does not end up as rich as an agent that one-boxes:

Rachel has a perfectly good answer to the "Why ain't you rich?" question. "I am not rich," she will say, "because I am not the kind of person [Omega] thinks will refuse the money. I'm just not like you, Irene [the one-boxer]. Given that I know that I am the type who takes the money, and given that [Omega] knows that I am this type, it was reasonable of me to think that the $1,000,000 was not in [the box]. The $1,000 was the most I was going to get no matter what I did. So the only reasonable thing for me to do was to take it."

Irene may want to press the point here by asking, "But don't you wish you were like me, Rachel?"... Rachel can and should admit that she does wish she were more like Irene... At this point, Irene will exclaim, "You've admitted it! It wasn't so smart to take the money after all." Unfortunately for Irene, her conclusion does not follow from Rachel's premise. Rachel will patiently explain that wishing to be a [one-boxer] in a Newcomb problem is not inconsistent with thinking that one should take the $1,000 whatever type one is. When Rachel wishes she was Irene's type she is wishing for Irene's options, not sanctioning her choice... While a person who knows she will face (has faced) a Newcomb problem might wish that she were (had been) the type that [Omega] labels a [one-boxer], this wish does not provide a reason for being a [one-boxer]. It might provide a reason to try (before [the boxes are filled]) to change her type if she thinks this might affect [Omega's] prediction, but it gives her no reason for doing anything other than taking the money once she comes to believes that she will be unable to influence what [Omega] does.

In other words, this response distinguishes between the winning decision and the winning type of agent and claims that two-boxing is the winning decision in Newcomb’s problem (even if one-boxers are the winning type of agent). Consequently, insofar as decision theory is about determining which decision is rational, on this account CDT reasons correctly in Newcomb’s problem.

For those that find this response perplexing, an analogy could be drawn to the chewing gum problem. In this scenario, there is near unanimous agreement that the rational decision is to chew gum. However, statistically, non-chewers will be better off than chewers. As such, the non-chewer could ask, “if you’re so smart, why aren’t you healthy?” In this case, the above response seems particularly appropriate. The chewers are less healthy not because of their decision but rather because they’re more likely to have an undesirable gene. Having good genes doesn’t make the non-chewer more rational but simply more lucky. The proponent of CDT simply makes a similar response to Newcomb’s problem: one-boxers aren’t richer because of their decision but rather because of the type of agent that they were when the boxes were filled.

One final point about this response is worth noting. A proponent of CDT can accept the above argument but still acknowledge that, if given the choice before the boxes are filled, they would be rational to choose to modify themselves to be a one-boxing type of agent (as Joyce acknowledged in the above passage and as argued for in Burgess, 2004). To the proponent of CDT, this is unproblematic: if we are sometimes rewarded not for the rationality of our decisions in the moment but for the type of agent we were at some past moment then it should be unsurprising that changing to a different type of agent might be beneficial.

The response to this defense of two-boxing in Newcomb’s problem has been divided. Many find it compelling but others, like Ahmed and Price (2012) think it does not adequately address to the challenge:

It is no use the causalist's whining that foreseeably, Newcomb problems do in fact reward irrationality, or rather CDT-irrationality. The point of the argument is that if everyone knows that the CDT-irrational strategy will in fact do better on average than the CDT-rational strategy, then it's rational to play the CDT-irrational strategy.

Given this, there seem to be two positions one could take on these issues. If the response given by the proponent of CDT is compelling, then we should be attempting to develop a decision theory that two-boxes on Newcomb’s problem. Perhaps the best theory for this role is CDT but perhaps it is instead BT, which many people think reasons better in the psychopath button scenario. On the other hand, if the response given by the proponents of CDT is not compelling, then we should be developing a theory that one-boxes in Newcomb’s problem. In this case, TDT, or something like it, seems like the most promising theory currently on offer.

Improving Human Rationality Through Cognitive Change (intro)

lukeprog — 2013-02-24T04:49:48.976Z

This is the introduction to a paper I started writing long ago, but have since given up on. The paper was going to be an overview of methods for improving human rationality through cognitive change. Since it contains lots of handy references on rationality, I figured I'd publish it, in case it's helpful to others.

1. Introduction

During the last half-century, cognitive scientists have catalogued dozens of common errors in human judgment and decision-making (Griffin et al. 2012; Gilovich et al. 2002). Stanovich (1999) provides a sobering introduction:

For example, people assess probabilities incorrectly, they display confirmation bias, they test hypotheses inefficiently, they violate the axioms of utility theory, they do not properly calibrate degrees of belief, they overproject their own opinions onto others, they allow prior knowledge to become implicated in deductive reasoning, they systematically underweight information about nonoccurrence when evaluating covariation, and they display numerous other information-processes biases...

The good news is that researchers have also begun to understand the cognitive mechanisms which produce these errors (Kahneman 2011; Stanovich 2010), they have found several "debiasing" techniques that groups or individuals may use to partially avoid or correct these errors (Larrick 2004), and they have discovered that environmental factors can be used to help people to exhibit fewer errors (Thaler and Sunstein 2009; Trout 2009).

This "heuristics and biases" research program teaches us many lessons that, if put into practice, could improve human welfare. Debiasing techniques that improve human rationality may be able to decrease rates of violence caused by ideological extremism (Lilienfeld et al. 2009). Knowledge of human bias can help executives make more profitable decisions (Kahneman et al. 2011). Scientists with improved judgment and decision-making skills ("rationality skills") may be more apt to avoid experimenter bias (Sackett 1979). Understanding the nature of human reasoning can also improve the practice of philosophy (Knobe et al. 2012; Talbot 2009; Bishop and Trout 2004; Muehlhauser 2012), which has too often made false assumptions about how the mind reasons (Weinberg et al. 2001; Lakoff and Johnson 1999; De Paul and Ramsey 1999). Finally, improved rationality could help decision makers to choose better policies, especially in domains likely by their very nature to trigger biased thinking, such as investing (Burnham 2008), military command (Lang 2011; Williams 2010; Janser 2007), intelligence analysis (Heuer 1999), or the study of global catastrophic risks (Yudkowsky 2008a).

But is it possible to improve human rationality? The answer, it seems, is "Yes." Lovallo and Sibony (2010) showed that when organizations worked to reduce the effect of bias on their investment decisions, they achieved returns of 7% or higher. Multiple studies suggest that a simple instruction to "think about alternative hypotheses" can counteract overconfidence, confirmation bias, and anchoring effects, leading to more accurate judgments (Mussweiler et al. 2000; Koehler 1994; Koriat et al. 1980). Merely warning people about biases can decrease their prevalence, at least with regard to framing effects (Cheng and Wu 2010), hindsight bias (Hasher et al. 1981; Reimers and Butler 1992), and the outcome effect (Clarkson et al. 2002). Several other methods have been shown to meliorate the effects of common human biases (Larrick 2004). Judgment and decision-making appear to be skills that can be learned and improved with practice (Dhami et al. 2012).

In this article, I first explain what I mean by "rationality" as a normative concept. I then review the state of our knowledge concerning the causes of human errors in judgment and decision-making (JDM). The largest section of our article summarizes what we currently know about how to improve human rationality through cognitive change (e.g. "rationality training"). We conclude by assessing the prospects for improving human rationality through cognitive change, and by recommending particular avenues for future research.

2. Normative Rationality

In cognitive science, rationality is a normative concept (Stanovich 2011). As Stanovich (2012) explains, "When a cognitive scientist terms a behavior irrational he/she means that the behavior departs from the optimum prescribed by a particular normative model."

This normative model of rationality consists in logic, probability theory, and rational choice theory. In their opening chapter for The Oxford Handbook of Thinking and Reasoning, Chater and Oaksford (2012) explain:

Is it meaningful to attempt to develop a general theory of rationality at all? We might tentatively suggest that it is a prima facie sign of irrationality to believe in alien abduction, or to will a sports team to win in order to increase their chance of victory. But these views or actions might be entirely rational, given suitably nonstandard background beliefs about other alien activity and the general efficacy of psychic powers. Irrationality may, though, be ascribed if there is a clash between a particular belief or behavior and such background assumptions. Thus, a thorough-going physicalist may, perhaps, be accused of irrationality if she simultaneously believes in psychic powers. A theory of rationality cannot, therefore, be viewed as clarifying either what people should believe or how people should act—but it can determine whether beliefs and behaviors are compatible. Similarly, a theory of rational choice cannot determine whether it is rational to smoke or to exercise daily; but it might clarify whether a particular choice is compatible with other beliefs and choices.

From this viewpoint, normative theories can be viewed as clarifying conditions of consistency… Logic can be viewed as studying the notion of consistency over beliefs. Probability… studies consistency over degrees of belief. Rational choice theory studies the consistency of beliefs and values with choices.

There are many good tutorials on logic (Schechter 2005), probability theory (Koller and Friedman 2009), and rational choice theory (Allington 2002; Parmigiani and Inoue 2009), so I will make only two quick points here. First, by "probability" I mean the subjective or Bayesian interpretation of probability, because that is the interpretation which captures degrees of belief (Oaksford and Chater 2007; Jaynes 2003; Cox 1946). Second, in rational choice theory I am of course endorsing the normative principle of expected utility maximization (Grant & Zandt 2009).

According to this concept of rationality, then, an agent is rational if its beliefs are consistent with the laws of logic and probability theory and its decisions are consistent with the laws of rational choice theory. An agent is irrational to the degree that its beliefs violate the laws of logic or probability theory, or its decisions violate the laws of rational choice theory.¹

Researchers working in the heuristics and biases tradition have shown that humans regularly violate the norms of rationality (Manktelow 2012; Pohl 2005). These researchers tend to assume that human reasoning could be improved, and thus they have been called "Meliorists" (Stanovich 1999, 2004), and their program of using psychological findings to make recommendations for improving human reasoning has been called "ameliorative psychology" (Bishop and Trout 2004).

Another group of researchers, termed the "Panglossians,"² argue that human performance is generally "rational" because it manifests an evolutionary adaptation for optimal information processing (Gigerenzer et al. 1999).

I disagree with the Panglossian view for reasons detailed elsewhere (Griffiths et al. 2012:27; Stanovich 2010, ch. 1; Stanovich and West 2003; Stein 1996), though I also believe the original dispute between Meliorists and Panglossians has been exaggerated (Samuels et al. 2002). In any case, a verbal dispute over what counts as "normative" for human JDM need not detain us here.³ I have stipulated my definition of normative rationality — for the purposes of cognitive psychology — above. MY concern is with the question of whether cognitive change can improve human JDM in ways that enable humans to achieve their goals more effectively than without cognitive change, and it seems (as I demonstrate below) that the answer is "yes."

MY view of normative rationality does not imply, however, that humans ought to explicitly use the laws of rational choice theory to make every decision. Neither humans nor machines have the knowledge and resources to do so (Van Rooij 2008; Wang 2011). Thus, in order to approximate normative rationality as best we can, we often (rationally) engage in a "bounded rationality" (Simon 1957) or "ecological rationality" (Gigerenzer and Todd 2012) or "grounded rationality" (Elqayam 2011) that employs simple heuristics to imperfectly achieve our goals with the limited knowledge and resources at our disposal (Vul 2010; Vul et al. 2009; Kahneman and Frederick 2005). Thus, the best prescription for human reasoning is not necessarily to always use the normative model to govern one's thinking (Grant & Zandt 2009; Stanovich 1999; Baron 1985). Baron (2008, ch. 2) explains:

In short, normative models tell us how to evaluate judgments and decisions in terms of their departure from an ideal standard. Descriptive models specify what people in a particular culture actually do and how they deviate from the normative models. Prescriptive models are designs or inventions, whose purpose is to bring the results of actual thinking into closer conformity to the normative model. If prescriptive recommendations derived in this way are successful, the study of thinking can help people to become better thinkers.

[next, I was going to discuss the probable causes of JDM errors, tested methods for amelioration, and promising avenues for further research]

Notes

¹ For a survey of other conceptions of rationality, see Nickerson (2007). Note also that our concept of rationality is personal, not subpersonal (Frankish 2009; Davies 2000; Stanovich 2010:5).

² The adjective "Panglossian" was originally applied by Steven Jay Gould and Richard Lewontin (1979), who used it to describe knee-jerk appeals to natural selection as the force that explains every trait. The term comes from Voltaire's character Dr. Pangloss, who said that "our noses were made to carry spectacles" (Voltaire 1759).

³ To resolve such verbal disputes we can employ the "method of elimination" (Chalmers 2011) or, as Yudkowsky (2008) put it, we can "replace the symbol with the substance."

Great rationality posts in the OB archives

lukeprog — 2013-02-23T23:33:51.624Z

Those aching for good rationality writing can get their fix from Great rationality posts by LWers not posted to LW, and also from the Overcoming Bias archives. Some highlights are below, up through June 28, 2007.

Finney, Foxes vs. Hedgehogs: Predictive Success
Hanson, When Error is High, Simplify
Shulman, Meme Lineages and Expert Consensus
Hanson, Resolving Your Hypocrisy
Hanson, Academic Overconfidence
Hanson, Conspicuous Consumption of Info
Sandberg, Supping with the Devil
Hanson, Conclusion-Blind Review
Shulman, Should We Defer to Secret Evidence?
Shulman, Sick of Textbook Errors
Hanson, Dare to Deprogram Me?
Armstrong, Biases, By and Large
Friedman, A Tough Balancing Act
Hanson, RAND Health Insurance Experiment
Armstrong, The Case for Dangerous Testing
Hanson, In Obscurity Errors Remain
Falkenstein, Hofstadter's Law
Hanson, Against Free Thinkers

Great rationality posts by LWers not posted to LW

lukeprog — 2013-02-16T00:31:20.077Z

Ever since Eliezer, Yvain, and myself stopped posting regularly, LW's front page has mostly been populated by meta posts. (The Discussion section is still abuzz with interesting content, though, including original research.)

Luckily, many LWers are posting potentially front-page-worthy content to their own blogs.

Below are some recent-ish highlights outside Less Wrong, for your reading enjoyment. I've added an * to my personal favorites.

Overcoming Bias (Robin Hanson, Rob Wiblin, Katja Grace, Carl Shulman)

Hanson, Beware Far Values
Wiblin, Is US Gun Control an Important Issue?
Wiblin, Morality As Though It Really Mattered
Grace, Can a Tiny Bit of Noise Destroy Communication?
Shulman, Nuclear winter and human extinction: Q&A with Luke Oman
Wiblin, Does complexity bias biotechnology towards doing damage?

Yvain (now moved here)

The Rationalist Conspiracy (Alyssa Vance)

Reflective Disequilibrium (Carl Shulman)

Rational Altruist (Paul Christiano)

Alex Vermeer

15 Benefits of the Growth Mindset

Prince Mm Mm (Giles)

A reply to Mark Linsenmayer about philosophy

lukeprog — 2013-01-05T11:25:25.242Z

Mark Linsenmayer, one of the hosts of a top philosophy podcast called The Partially Examined Life, has written a critique of the view that Eliezer and I seem to take of philosophy. Below, I respond to a few of Mark's comments. Naturally, I speak only for myself, not for Eliezer.

I'm generally skeptical when someone proclaims that "rationality" itself should get us to throw out 90%+ of philosophy...

Sturgeon's Law declares that "90% of everything is crap." I think something like that is true, though perhaps it's 88% crap in physics, 99% crap in philosophy, and 99.99% crap on 4chan.

But let me be more precise. I do claim that almost all philosophy is useless for figuring out what is true, for reasons explained in several of my posts:

Mark replies that the kinds of unscientific philosophy I dismiss can be "useful at least in the sense of entertaining," which of course isn't something I'd deny. I'm just trying to say that Heidegger is pretty darn useless for figuring out what's true. There are thousands of readings that will more efficiently make your model of the world more accurate.

If you want to read Heidegger as poetry or entertainment, that's fine. I watch Game of Thrones, but not because it's a useful inquiry into truth.

Also, I'm not sure what it would mean to say we should throw out 90% of philosophy because of rationality, but I probably don't agree with the "because" clause, there.

[Luke's] accusation is that most philosophizing is useless unless explicitly based on scientific knowledge on how the brain works, and in particular where intuitions come from... [But] to then throw out the mass of the philosophical tradition because it has been ignorant of [cognitive biases] is [a mistake].

I don't, in fact, think that "most philosophizing is useless unless explicitly based on scientific knowledge [about] how the brain works," nor do I "throw out the mass of the philosophical tradition because it has been ignorant of [cognitive biases]." Sometimes, people do pretty good philosophy without knowing much of modern psychology. Look at all the progress Hume and Frege made.

What I do claim is that many specific philosophical positions and methods are undermined by scientific knowledge about how brains and other systems work. For example, I've argued that a particular kind of philosophical analysis, which assumes concepts are defined by necessary and sufficient conditions, is undermined by psychological results showing that brains don't store concepts that way.

If some poor philosopher doesn't know this, because she thinks it's okay to spend all day using her brain to philosophize without knowing much about how brains work, she might spend several years of her career pointlessly trying to find a necessary-and-sufficient-conditions analysis of knowledge that is immune to Gettier-style counterexamples.

That's one reason to study psychology before doing much philosophy. Doing so can save you lots of time.

Another reason to study psychology is that psychology is a significant component of rationality training (yes, with daily study and exercise, like piano training). Rationality training is important for doing philosophy because philosophy needs to trust your rationality even though it shouldn't.

...Looking over Eliezer's site and Less Wrong... my overall impression is again that... none of this adds up to the blanket critique/world-view that comes through very clearly

Less Wrong is a group blog, so it doesn't quite have its own philosophy or worldview.

Eliezer, however, most certainly does. His approach to epistemology is pretty thoroughly documented in the ongoing, book-length sequence Highly Advanced Epistemology 101 for Beginners. Additional parts of his "worldview" comes to light in his many posts on philosophy of language, free will, metaphysics, metaethics, normative ethics, axiology, and philosophy of mind.

I've written less about my own philosophical views, but you can get some of them in two (ongoing) sequences: Rationality and Philosophy and No-Nonsense Metaethics.

I think it's instructive to contrast Eliezer with David Chalmers... who is very much on top of the science in his field... and yet he is not on board with any of this "commit X% of past philosophy to the flames" nonsense, doesn't think metaphysical arguments are meaningless or that difficult philosophical problems need to be defined away in some way, and, most provocatively, sees in consciousness a challenge to a physicalist world-view... I respectfully suggest that while reading more in contemporary science is surely a good idea... the approach to philosophy that is actually schooled in philosophy a la Chalmers is more worthy of emulation than Eliezer's dismissive anti-philosophy take.

Chalmers is a smart dude, a good writer, and fun to hang with. But Mark doesn't explain here why it's "nonsense" to propose that truth-seekers (qua truth-seekers) should ignore 99% of all philosophy, why many metaphysical arguments aren't meaningless, why some philosophical problems can't simply be dissolved, nor why Chalmers' approach to philosophy is superior to Eliezer's.

And that's fine. As Mark wrote, "I intended this post to be a high-level overview of positions." I'd just like to flag that arguments weren't provided in Mark's post.

Meanwhile, I've linked above to many posts Eliezer and I have written about why most philosophy is useless for truth-seeking, why some metaphysical arguments are meaningless, and why some philosophical problems can be dissolved. (We'd have to be more specific about the Chalmers vs. Eliezer question before I could weigh in. For example, I find Chalmers' writing to be clearer, but Eliezer's choice of topics for investigation more important for the human species.)

Finally, I'll note that Nick Bostrom takes roughly the same approach to philosophy as Eliezer and I do, but Nick has a position at Oxford University, publishes in leading philosophy journals, and so on. On philosophical method, I recommend Nick's first professional paper, Predictions from Philosophy (1997). It sums up the motivation behind much of what Nick and Eliezer have done since then.

Ideal Advisor Theories and Personal CEV

lukeprog — 2012-12-25T13:04:46.889Z

Update 5-24-2013: A cleaned-up, citable version of this article is now available on MIRI's website.

Co-authored with crazy88

Summary: Yudkowsky's "coherent extrapolated volition" (CEV) concept shares much in common Ideal Advisor theories in moral philosophy. Does CEV fall prey to the same objections which are raised against Ideal Advisor theories? Because CEV is an epistemic rather than a metaphysical proposal, it seems that at least one family of CEV approaches (inspired by Bostrom's parliamentary model) may escape the objections raised against Ideal Advisor theories. This is not a particularly ambitious post; it mostly aims to place CEV in the context of mainstream moral philosophy.

What is of value to an agent? Maybe it's just whatever they desire. Unfortunately, our desires are often the product of ignorance or confusion. I may desire to drink from the glass on the table because I think it is water when really it is bleach. So perhaps something is of value to an agent if they would desire that thing if fully informed. But here we crash into a different problem. It might be of value for an agent who wants to go to a movie to look up the session times, but the fully informed version of the agent will not desire to do so — they are fully-informed and hence already know all the session times. The agent and its fully-informed counterparts have different needs. Thus, several philosophers have suggested that something is of value to an agent if an ideal version of that agent (fully informed, perfectly rational, etc.) would advise the non-ideal version of the agent to pursue that thing.

This idea of idealizing or extrapolating an agent's preferences¹ goes back at least as far as Sidgwick (1874), who considered the idea that "a man's future good" consists in "what he would now desire... if all the consequences of all the different [actions] open to him were accurately forseen..." Similarly, Rawls (1971) suggested that a person's good is the plan "that would be decided upon as the outcome of careful reflection in which the agent reviewed, in the light of all the relevant facts, what it would be like to carry out these plans..." More recently, in an article about rational agents and moral theory, Harsanyi (1982) defined what an agent's rational wants as “the preferences he would have if he had all the relevant factual information, always reasoned with the greatest possible care, and were in a state of mind most conducive to rational choice.” Then, a few years later, Railton (1986) identified a person's good with "what he would want himself to want... were he to contemplate his present situation from a standpoint fully and vividly informed about himself and his circumstances, and entirely free of cognitive error or lapses of instrumental rationality."

Rosati (1995) calls these theories Ideal Advisor theories of value because they identify one's personal value with what an ideal version of oneself would advise the non-ideal self to value.

Looking not for a metaphysical account of value but for a practical solution to machine ethics (Wallach & Allen 2009; Muehlhauser & Helm 2012), Yudkowsky (2004) described a similar concept which he calls "coherent extrapolated volition" (CEV):

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

In other words, the CEV of humankind is about the preferences that we would have as a species if our preferences were extrapolated in certain ways. Armed with this concept, Yudkowsky then suggests that we implement CEV as an "initial dynamic" for "Friendly AI." Tarleton (2010) explains that the intent of CEV is that "our volition be extrapolated once and acted on. In particular, the initial extrapolation could generate an object-level goal system we would be willing to endow a superintelligent [machine] with."

CEV theoretically avoids many problems with other approaches to machine ethics (Yudkowsky 2004; Tarleton 2010; Muehlhauser & Helm 2012). However, there are reasons it may not succeed. In this post, we examine one such reason: Resolving CEV at the level of humanity (Global CEV) might require at least partially resolving CEV at the level of individuals (Personal CEV)², but Personal CEV is similar to ideal advisor theories of value,³ and such theories face well-explored difficulties. As such, these difficulties may undermine the possibility of determining the Global CEV of humanity.

Before doing so, however, it's worth noting one key difference between Ideal Advisor theories of value and Personal CEV. Ideal Advisor theories typically are linguistic or metaphysical theories, while the role of Personal CEV is epistemic. Ideal Advisor theorists attempts to define what it is for something to be of value for an agent. Because of this, their accounts needs to give an unambiguous and plausible answer in all cases. On the other hand, Personal CEV's role is an epistemic one: it isn't intended to define what is of value for an agent. Rather, Personal CEV is offered as a technique that can help an AI to come to know, to some reasonable but not necessarily perfect level of accuracy, what is of value for the agent. To put it more precisely, Personal CEV is intended to allow an initial AI to determine what sort of superintelligence to create such that we end up with what Yudkowsky calls a "Nice Place to Live." Given this, certain arguments are likely to threaten Ideal Advisor theories and not to Personal CEV, and vice versa.

With this point in mind, we now consider some objections to ideal advisor theories of value, and examine whether they threaten Personal CEV.

Sobel's First Objection: Too many voices

Four prominent objections to ideal advisor theories are due to Sobel (1994). The first of these, the “too many voices” objection, notes that the evaluative perspective of an agent changes over time and, as such, the views that would be held by the perfectly rational and fully informed version of the agent will also change. This implies that each agent will be associated not with one idealized version of themselves but with a set of such idealized versions (one at time t, one at time t+1, etc.), some of which may offer conflicting advice. Given this “discordant chorus,” it is unclear how the agent’s non-moral good should be determined.

Various responses to this objection run into their own challenges. First, privileging a single perspective (say, the idealized agent at time t+387) seems ad hoc. Second, attempting to aggregate the views of multiple perspectives runs into the question of how trade offs should be made. That is, if two of the idealized viewpoints disagree about what is to be preferred, it’s unclear how an overall judgment should be reached.⁴ Finally, suggesting that the idealized versions of the agent at different times will have the same perspective seems unlikely, and surely it's a substantive claim requiring a substantive defense. So the obvious responses to Sobel’s first objection introduce serious new challenges which then need to be resolved.

One final point is worth noting: it seems that this objection is equally problematic for Personal CEV. The extrapolated volition of the agent is likely to vary at different times, so how ought we determine an overall account of the agent’s extrapolated volition?

Sobel’s Second and Third Objections: Amnesia

Sobel’s second and third objections build on two other claims (see Sobel 1994 for a defense of these). First: some lives can only be evaluated if they are experienced. Second: experiencing one life can leave you incapable of experiencing another in an unbiased way. Given these claims, Sobel presents an amnesia model as the most plausible way for an idealized agent to gain the experiences necessary to evaluate all the relevant lives. According to this model, an agent experiences each life sequentially but undergoes an amnesia procedure after each one so that they may experience the next life uncolored by their previous experiences. After experiencing all lives, the amnesia is then removed.

Following on from this, Sobel’s second objection is that the sudden recollection of a life from one evaluative perspective and living a life from a vastly different evaluative perspective may be strongly dissimilar experiences. So when the amnesia is removed, the agent has a particular evaluative perspective (informed by their memories of all the lives they’ve lived) that differs so much from the evaluative perspective they had when they lived the life independently of such memories that they might be incapable of adequately evaluating the lives they’ve experienced based on their current, more knowledgeable, evaluative perspective.

Sobel’s third objection also relates to the amnesia model: Sobel argues that the idealized agent might be driven insane by the entire amnesia process and hence might not be able to adequately evaluate what advice they ought to give the non-ideal agent. In response to this, there is some temptation to simply demand that the agent be idealized not just in terms of rationality and knowledge but also in terms of their sanity. However, perhaps any idealized agent that is similar enough to the original to serve as a standard for their non-moral good will be driven insane by the amnesia process and so the demand for a sane agent will simply mean that no adequate agent can be identified.

If we grant that an agent needs to experience some lives to evaluate them, and we grant that experiencing some lives leaves them incapable of experiencing others, then there seems to be a strong drive for Personal CEV to rely on an amnesia model to adequately determine what an agent’s volition would be if extrapolated. If so, however, then Personal CEV seems to face the challenges raised by Sobel.

Sobel’s Fourth Objection: Better Off Dead

Sobel’s final objection is that the idealized agent, having experienced such a level of perfection, might come to the conclusion that their non-ideal counterpart is so limited as to be better off dead. Further, the ideal agent might make this judgment because of the relative level of well-being of the non-ideal agent rather than the agent’s absolute level of well-being. (That is, the ideal agent may look upon the well-being of the non-ideal agent as we might look upon our own well-being after an accident that caused us severe mental damage. In such a case, we might be unable to objectively judge our life after the accident due to the relative difficulty of this life as compared with our life before the accident.) As such, this judgment may not capture what is actually in accordance with the agent’s non-moral good.

Again, this criticism seems to apply equally to Personal CEV: when the volition of an agent is extrapolated, it may turn out that this volition endorses killing the non-extrapolated version of the agent. If so this seems to be a mark against the possibility that Personal CEV can play a useful part in a process that should eventually terminate in a "Nice Place to Live."

A model of Personal CEV

The seriousness of these challenges for Personal CEV is likely to vary depending on the exact nature of the extrapolation process. To give a sense of the impact, we will consider one family of methods for carrying out this process: the parliamentary model (inspired by Bostrom 2009). According to this model, we determine the Personal CEV of an agent by simulating multiple versions of them, extrapolated from various starting times and along different developmental paths. Some of these versions are then assigned as a parliament where they vote on various choices and make trades with one another.

Clearly this approach allows our account of Personal CEV to avoid the too many voices objection. After all, the parliamentary model provides us with an account of how we can aggregate the views of the agent at various times: we should simulate the various agents and allow them to vote and trade on the choices to be made. It is through this voting and trading that the various voices can be combined into a single viewpoint. While this process may not be adequate as a metaphysical account of value, it seems more plausible as an account of Personal CEV as an epistemic notion. Certainly, your authors would deem themselves to be more informed about what they value if they knew the outcome of the parliamentary model for themselves.

This approach is also able to avoid Sobel’s second and third objections. The objections were specifically targeted at the amnesia model where one agent experienced multiple lives. As the parliamentary model does not utilize amnesia, it is immune to these concerns.

What of Sobel’s fourth objection? Sobel’s concern here is not simply that the idealized agent might advise the agent to kill themselves. After all, sometimes death may, in fact, be of value for an agent. Rather, Sobel’s concern is that the idealized agent, having experienced such heights of existence, will become biased against the limited lives of normal agents.

It's less clear how the parliamentary model deals with Sobel's fourth objection which plausibly retains its initial force against this model of Personal CEV. However, we're not intending to solve Personal CEV entirely in this short post. Rather, we aim to demonstrate only that the force of Sobel's four objections will depend on the model of Personal CEV selected. Reflection on the parliamentary model makes this point clear.

So the parliamentary model seems able to avoid at least three of the direct criticisms raised by Sobel. It is worth noting, however, that some concerns remain. Firstly, for those that accept Sobel’s claim that experience is necessary to evaluate some lives, it is clear that no member of the parliament will be capable of comparing their life to all other possible lives, as none will have all the required experience. As such, the agents may falsely judge a certain aspect of their life to be more or less valuable than it, in fact, is. For a metaphysical account of personal value, this problem might be fatal. Whether it is also fatal for the parliamentary model of Personal CEV depends on whether the knowledge of the various members of the parliament is enough to produce a “Nice Place to Live” regardless of its imperfection.

Two more issues might arise. First, the model might require careful selection of who to appoint to the parliament. For example, if most of the possible lives that an agent could live would drive them insane, then selecting which of these agents to appoint to the parliament at random might lead to a vote by the mad. Second, it might seem that this approach to determining Personal CEV will require a reasonable level of accuracy in simulation. If so, there might be concerns about the creation of, and responsibility to, potential moral agents.

Given these points, a full evaluation of the parliamentary model will require more detailed specification and further reflection. However, two points are worth noting in conclusion. First, the parliamentary model does seem to avoid at least three of Sobel’s direct criticisms. Second, even if this model eventually ends up being flawed on other grounds, the existence of one model of Personal CEV that can avoid three of Sobel’s objections gives us reason to expect other promising models of Personal CEV may be discovered.

Notes

¹ Another clarification to make concerns the difference between idealization and extrapolation. An idealized agent is a version of the agent with certain idealizing characteristics (perhaps logical omniscience and infinite speed of thought). An extrapolated agent is a version of the agent that represents what they would be like if they underwent certain changes or experiences. Note two differences between these concepts. First, an extrapolated agent need not be ideal in any sense (though useful extrapolated agents often will be) and certainly need not be perfectly idealized. Second, extrapolated agent are determined by a specific type of process (extrapolation from the original agent) whereas no such restriction is placed on how the form of an idealized agent is determined. CEV utilizes extrapolation rather than idealization, as do some Ideal Advisor theories. In this post, we talk about "ideal" or "idealized" agents as a catch-all for both idealized agents and extrapolated agents.

² Standard objections to ideal advisor theories of value are also relevant to some proposed variants of CEV, for example Tarleton (2010)'s suggestion of "Individual Extrapolated Volition followed by Negotiation, where each individual human’s preferences are extrapolated by factual correction and reflection; once that process is fully complete, the extrapolated humans negotiate a combined utility function for the resultant superintelligence..." Furthermore, some objections to Ideal Advisor theories also seem relevant to Global CEV even if they are not relevant to a particular approach to Personal CEV, though that discussion is beyond the scope of this article. As a final clarification, see Dai (2010).

³ Ideal Advisor theories are not to be confused with "Ideal Observer theory" (Firth 1952). For more on Ideal Advisor theories of value, see Zimmerman (2003); Tanyi (2006); Enoch (2005); Miller (2013, ch. 9).

⁴ This is basically an intrapersonal version of the standard worries about interpersonal comparisons of well-being. The basis of these worries is that even if we can specify an agent’s preferences numerically, it’s unclear how we should compare the numbers assigned by one agent with the numbers assigned by the other. In the intrapersonal case, the challenge is to determine how to compare the numbers assigned by the same agent at different times. See Gibbard (1989).

	CGTA present	CGTA absent
Chew Gum	89% die	8% die
Don’t chew	99% die	11% die