Thoughts on the Singularity Institute (SI) 2012-05-11T04:31:30.364Z
Maximizing Cost-effectiveness via Critical Inquiry 2011-11-10T19:25:14.904Z
Why We Can't Take Expected Value Estimates Literally (Even When They're Unbiased) 2011-08-18T23:34:12.099Z


Comment by holdenkarnofsky on Q for GiveWell: What is GiveDirectly's mechanism of action? · 2013-08-05T04:27:00.210Z · LW · GW

Eliezer, I think inflation caused via cash transfers results (under some fairly basic assumptions) in unchanged - not negative - total real wealth for the aggregate set of people experiencing the inflation, because this aggregate set of people includes the same set of people that causes the inflation as a result of having more currency. There may be situations in which "N people receive X units of currency, but the supply of goods they purchase remains fixed, so they experience inflation and do not end up with more real wealth", but not situations in which "N people receive X units of currency and as a result have less real wealth, or cause inflation for others that lowers the others' real wealth more than their own has risen."

If you believe that GiveDirectly's transfers cause negligible inflation for the people receiving them (as implied by the studies we've reviewed), this implies that those people become materially wealthier by (almost) the amount of the transfer. There may be other people along the chain who experience inflation, but these people at worst have unchanged real wealth in aggregate (according to the previous paragraph). (BTW, I've focused in on the implications of your scenario for inflation because we have data regarding inflation.)

It's theoretically possible that the distributive effects within this group are regressive (e.g., perhaps the people who sell to the GiveDirectly recipients then purchase goods that are needed by other lower-income people in another location, raising the price of those goods), but in order to believe this one has to make a seemingly large number of unjustified assumptions (including the assumption of an inelastic supply of the goods demanded by low-income Kenyans, which seems particularly unrealistic), and those distributive effects could just as easily be progressive.

It also seems worth noting that your concern would seem to apply equally to any case in which a donor purchases foreign currency and uses it to fund local services (e.g., from nonprofits), which would seem to include ~all cases of direct aid overseas.

If I'm still not fully addressing your point, it might be worth your trying a "toy economy" construction to elucidate your concern - something along the lines of "Imagine that there are a total of 10 people in Kenya; that 8 are poor, have wealth equal to X, and consume goods A and B at prices Pa and Pb; that 2 are wealthy, have wealthy equal to Y, and consume goods C and D at prices Pc and Pd. When the transfer is made, dollars are traded for T shillings, which are then distributed as follows, which has the following impact on prices and real consumption ..." I often find these constructions useful in these sorts of discussions to elucidate exactly the potential scenario one has in mind.

Comment by holdenkarnofsky on Q for GiveWell: What is GiveDirectly's mechanism of action? · 2013-08-02T18:27:24.714Z · LW · GW

Thanks for the thoughtful post.

If recipients of cash transfers buy Kenyan goods, and the producers of those goods use their extra shillings to buy more Kenyan goods, and eventually someone down the line trades their shillings for USD, this would seem to be equivalent in the relevant ways to the scenario you outline in which "U.S. dollars were being sent directly to Kenyan recipients and used only to purchase foreign goods" - assuming no directly-caused inflation in Kenyan prices. In other words, it seems to me that you're essentially positing a potential offsetting harm of cash transfers in the form of inflation, and in the case where transfers do not cause inflation, there is no concern.

At the micro/village level, we've reviewed two studies showing minor (if any) inflation. At the country level, it's worth noting that the act of buying Kenyan currency with USD should be as deflationary as the act of putting those Kenyan currency back into the economy is inflationary. Therefore, it seems to me that inflation is a fairly minor concern.

I'm not entirely sure I've understood your argument, so let me know if that answer doesn't fully address it.

That said, it's important to note that we do not claim 100% confidence - or an absence of plausible negative/offsetting effects - for any of our top charities. For the intervention of each charity we review, we include a "negative/offsetting effects" section that lists possible negative/offsetting effects, and in most cases we can't conclusively dismiss such effects. Nonetheless, having noted and considered the possible negative/offsetting effects, we believe the probability that our top charities are accomplishing substantial net good is quite high, higher than for any other giving opportunities we're aware of.

Comment by holdenkarnofsky on Bayesian Adjustment Does Not Defeat Existential Risk Charity · 2013-03-26T19:51:36.938Z · LW · GW

Responses on some more minor points (see my previous comment for big-picture responses):

Regarding "BA updates on a point estimate rather than on the full evidence that went into the point estimate" - I don't understand this claim. BA updates on the full probability distribution of the estimate, which takes into account potential estimate error. The more robust the estimate, the smaller the BA.

Regarding "double-counting" priors, I have not advocated for doing both an explicit "skepticism discount" in one's EEV calculation and then performing a BA on the output based on the same reasons for skepticism. Instead, I've discussed the pros and cons of these two different approaches to accounting for skepticism. There are cases in which I think some sources of skepticism (such as "only 10% of studies in this reference class are replicable") should be explicitly adjusted for, while others ("If a calculation tells me that an action is the best I can take, I should be skeptical because the conclusion is a priori unlikely") should be implicitly adjusted for. But I don't believe anything I've said implies that one should "double-count priors."

Regarding " log-normal priors would lead to different graphs in the second post, weakening the conclusion. To take the expectation of the logarithm and interpret that as the logarithm of the true cost-effectiveness is to bias the result downward." - FWIW, I did a version of my original analysis using log-normal distributions (including the correct formula for the expected value) and the picture didn't change much. I don't think this issue is an important one though I'm open to being convinced otherwise by detailed analysis.

I don't find the "charity doomsday argument" compelling. One could believe in low probability of extinction by (a) disputing that our current probability of extinction is high to begin with, or (b) accepting that it's high but disputing that it can only be lowered by a donation to one of today's charities (it could be lowered by a large set of diffuse actions, or by a small number of actions whose ability to get funding is overdetermined, or by a far-future charity, or by a combination). If one starts off believing that probability of extinction is high and that it can only be lowered by a particular charity working today that cannot close its funding gap without help from oneself, this seems to beg the question. (I don't believe this set of propositions.)

I don't believe any of the alternative solutions to "Pascal's Mugging" are compelling for all possible constructions of "Pascal's Mugging." The only one that seems difficult to get around by modifying the construction is the "bounded utility function" solution, but I don't believe it is reasonable to have a bounded utility function: I believe, for example, that one should be willing to pay $100 for a 1/N chance of saving N lives for any N>=1, if (as is not the case with "Pascal's Mugging") the "1/N chance of saving N lives" calculation is well supported and therefore robust (i.e., has relatively narrow error bars). Thus, "Pascal's Mugging" remains an example of the sort of "absurd implication" I'd expect for an insufficiently skeptical prior.

Finally, regarding "a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility point-of-view) a delay of over 10 million years." - I'm not aware of reasons to believe it's clear that it would be easier to reduce extinction risk by a percentage point than to speed colonization by 10 million years. If the argument is simply that "a single percentage point seems like a small number," then I believe this is simply an issue of framing, a case of making something very difficult sound easy by expressing it as a small probability of a fantastically difficult accomplishment. Furthermore, I believe that what you call "speedup" reduces net risk of extinction, so I don't think the comparison is valid. (I will elaborate on this belief in the future.)

Comment by holdenkarnofsky on Bayesian Adjustment Does Not Defeat Existential Risk Charity · 2013-03-26T19:49:13.254Z · LW · GW

Thanks for this post - I really appreciate the thoughtful discussion of the arguments I've made.

I'd like to respond by (a) laying out what I believe is a big-picture point of agreement, which I consider more important than any of the disagreements; (b) responding to what I perceive as the main argument this post makes against the framework I've advanced; (c) responding on some more minor points. (c) will be a separate comment due to length constraints.

A big-picture point of agreement: the possibility of vast utility gain does not - in itself - disqualify a giving opportunity as a good one, nor does it establish that the giving opportunity is strong. I'm worried that this point of agreement may be lost on many readers.

The OP makes it sound as though I believe that a high enough EEV is "ruled out" by priors; as discussed below, that is not my position. I agree, and always have, that "Bayesian adjustment does not defeat existential risk charity"; however, I think it defeats an existential risk charity that makes no strong arguments for its ability to make an impact, and relies on a "Pascal's Mugging" type argument for its appeal.

On the flip side, I believe that a lot of readers believe that "Pascal's Mugging" type arguments are sufficient to establish that a particular giving opportunity is outstanding. I don't believe the OP believes this.

I believe the OP and I are in agreement that one should support an existential risk charity if and only if it makes a strong overall case for its likely impact, a case that goes beyond the observation that even a tiny probability of success would imply high expected value. We may disagree on precisely how high the burden of argumentation is, and we probably disagree on whether MIRI clears that hurdle in its current form, but I don't believe either of us thinks the burden of argumentation is trivial or is so high that it can never be reached.

Response to what I perceive as the main argument of this post

It seems to me that the main argument of this post runs as follows:

  • The priors I'm using imply extremely low probabilities for certain events.
  • We don't have sufficient reasons to confidently assign such low probabilities to such events.

I think the biggest problems with this argument are as follows:

1 - Most importantly, nothing I've written implies an extremely low probability for any particular event. Nick Beckstead's comment on this post lays out the thinking here. The prior I describe isn't over expected lives saved or DALYs saved (or a similar metric); it's over the merit of a proposed action relative to the merits of other possible actions. So if one estimates that action A has a 10^-10 chance of saving 10^30 lives, while action B has a 50% chance of saving 1 life, one could be wrong about the difference between A and B by (a) overestimating the probability that action A will have the intended impact; (b) underestimating the potential impact of action B; (c) leaving out other consequences of A and B; (d) making some other mistake.

My current working theory is that proponents of "Pascal's Mugging" type arguments tend to neglect the "flow-through effects" of accomplishing good. There are many ways in which helping a person may lead to others' being helped, and ultimately may lead to a small probability of an enormous impact. Nick Beckstead raises a point similar to this one, and the OP has responded that it's a new and potentially compelling argument to him. I also think it's worth bearing in mind that there could be other arguments that we haven't thought of yet - and because of the structure of the situation, I expect such arguments to be more likely to point to further "regression to the mean" (so to make proponents of "Pascal's Mugging" arguments less confident that their proposed actions have high relative expected value) than to point in the other direction. This general phenomenon is a major reason that I place less weight on explicit arguments than many in this community - explicit arguments that consist mostly of speculation aren't very stable or reliable, and when "outside views" point the other way, I expect more explicit reflection to generate more arguments that support the "outside views."

2 - That said, I don't accept any of the arguments given here for why it's unacceptable to assign a very low probability to a proposition. I think there is a general confusion here between "low subjective probability that a proposition is correct" and "high confidence that a proposition isn't correct"; I don't think those two things are equivalent. Probabilities are often discussed with an "odds" framing, with the implication that assigning a 10^-10 probability to something means that I'd be willing to wager $10^10 against $1; this framing is a useful thought experiment in many cases, but when the numbers are like this I think it starts encouraging people to confuse their risk aversion with "non-extreme" (i.e., rarely under 1% or over 99%) subjective probabilities. Another framing is to ask, "If we could somehow do a huge number of 'trials' of this idea, say by simulating worlds constrained by the observations you've made, what would your over/under be for the proportion of trials in which the proposition is true?" and in that case one could simultaneously have an over/under of (10^-10 * # trials) and have extremely low confidence in one's view.

It seems to me that for any small p, there must be some propositions that we assign a probability at least as small as p. (For example, there must be some X such that the probability of an impact greater than X is smaller than p.) Furthermore, it isn't the case that assigning small p means that it's impossible to gather evidence that would change one's mind about p. For example, if you state to me that you will generate a random integer N1 between 1 and 10^100, there must be some integer N2 that I implicitly assign a probability of <=10^-100 as the output of your exercise. (This is true even if there are substantial "unknown unknowns" involved, for example if I don't trust that your generator is truly random.) Yet if you complete the exercise and tell me it produced the number N2, I quickly revise my probability from <=10^-100 to over 50%, based on a single quick observation.

For these reasons, I think the argument that "the mere fact that one assigns a sufficiently low probability to a proposition means that one must be in error" would have unacceptable implications and is not supported by the arguments in the OP.

Comment by holdenkarnofsky on Reply to Holden on The Singularity Institute · 2012-08-01T14:16:55.899Z · LW · GW

I greatly appreciate the response to my post, particularly the highly thoughtful responses of Luke (original post), Eliezer, and many commenters.

Broad response to Luke's and Eliezer's points:

As I see it, there are a few possible visions of SI's mission:

  • M1. SI is attempting to create a team to build a "Friendly" AGI.
  • M2. SI is developing "Friendliness theory," which addresses how to develop a provably safe/useful/benign utility function without needing iterative/experimental development; this theory could be integrated into an AGI developed by another team, in order to ensure that its actions are beneficial.
  • M3. SI is broadly committed to reducing AGI-related risks, and work on whatever will work toward that goal, including potentially M1 and M2.

My view is that the broader SI's mission, the higher the bar should be for the overall impressiveness of the organization and team. An organization with a very narrow, specific mission - such as "analyzing how to develop a provably safe/useful/benign utility function without needing iterative/experimental development" - can, relatively easily, establish which other organizations (if any) are trying to provide what it does and what the relative qualifications are; it can set clear expectations for deliverables over time and be held accountable to them; its actions and outputs are relatively easy to criticize and debate. By contrast, an organization with broader aims and less clearly relevant deliverables - such as "broadly aiming to reduce risks from AGI, with activities currently focused on community-building" - is giving a donor (or evaluator) less to go on in terms of what the space looks like, what the specific qualifications are and what the specific deliverables are. In this case it becomes more important that a donor be highly confident in the exceptional effectiveness of the organization and team as a whole.

Many of the responses to my criticisms (points #1 and #4 in Eliezer's response; "SI's mission assumes a scenario that is far less conjunctive than it initially appears" and "SI's goals and activities" section of Luke's response) correctly point out that they have less force, as criticisms, when one views SI's mission as relatively broad. However, I believe that evaluating SI by a broader mission raises the burden of affirmative arguments for SI's impressiveness. The primary such arguments I see in the responses are in Luke's list:

(1) The Sequences, the best tool I know for creating aspiring rationalists, (2) Harry Potter and the Methods of Rationality, a surprisingly successful tool for grabbing the attention of mathematicians and computer scientists around the world, and (3) the Singularity Summit, a mainstream-aimed conference that brings in people who end up making significant contributions to the movement — e.g. Tomer Kagan (an SI donor and board member) and David Chalmers (author of The Singularity: A Philosophical Analysis and The Singularity: A Reply).

I've been a consumer of all three of these, and while I've found them enjoyable, I don't find them sufficient for the purpose at hand. Others may reach a different conclusion. And of course, I continue to follow SI's progress, as I understand that it may submit more impressive achievements in the future.

Both Luke and Eliezer seem to disagree with the basic approach I'm taking here. They seem to believe that it is sufficient to establish that (a) AGI risk is an overwhelmingly important issue and that (b) SI compares favorably to other organizations that explicitly focus on this issue. For my part, I (a) disagree with the statement: "the loss in expected value resulting from an existential catastrophe is so enormous that the objective of reducing existential risks should be a dominant consideration whenever we act out of an impersonal concern for humankind as a whole"; (b) do not find Luke's argument that AI, specifically, is the most important existential risk to be compelling (it discusses only how beneficial it would be to address the issue well, not how likely a donor is to be able to help do so); (c) believe it is appropriate to compare the overall organizational impressiveness of the Singularity Institute to that of all other donation-soliciting organizations, not just to that of other existential-risk- or AGI-focused organizations. I would guess that these disagreements, particularly (a) and (c), come down to relatively deep worldview differences (related to the debate over "Pascal's Mugging") that I will probably write more about in the future.

On tool AI:

Most of my disagreements with SI representatives seem to be over how broad a mission is appropriate for SI, and how high a standard SI as an organization should be held to. However, the debate over "tool AI" is different, with both sides making relatively strong claims. Here SI is putting forth a specific point as an underappreciated insight and thus as a potential contribution/accomplishment; my view is that SI's suggested approach to AGI development is more dangerous than the "traditional" approach to software development, and thus that SI is advocating for an approach that would worsen risks from AGI.

My latest thoughts on this disagreement were posted separately in a comment response to Eliezer's post on the subject.

A few smaller points:

  • I disagree with Luke's claim that " objection #1 punts to objection #2." Objection #2 (regarding "tool AI") points out one possible approach to AGI that I believe is both consonant with traditional software development and significantly safer than the approach advocated by SI. But even if the "tool AI" approach is not in fact safer, there may be safer approaches that SI hasn't thought of. SI does not just emphasize the general problem that AGI may be dangerous (something that I believe is a fairly common view), but emphasizes a particular approach to AGI safety, one that seems to me to be highly dangerous. If SI's approach is dangerous relative to other approaches that others are taking/advocating, or even approaches that have yet to be developed (and will be enabled by future tools and progress on AGI), this is a problem for SI.
  • Luke states that rationality is "only a ceteris paribus predictor of success" and that it is a "weak one." I wish to register that I believe rationality is a strong (though not perfect) predictor of success, within the population of people who are as privileged (in terms of having basic needs met, access to education, etc.) as most SI supporters/advocates/representatives. So while I understand that success is not part of the definition of rationality, I stand by my statement that it is "the best evidence of superior general rationality (or of insight into it)."
  • Regarding donor-advised funds: opening an account with Vanguard, Schwab or Fidelity is a simple process, and I doubt any of these institutions would overrule a recommendation to donate to an organization such as SI (in any case, this is easily testable).
Comment by holdenkarnofsky on Reply to Holden on 'Tool AI' · 2012-08-01T14:09:11.199Z · LW · GW

To summarize how I see the current state of the debate over "tool AI":

  • Eliezer and I have differing intuitions about the likely feasibility, safety and usefulness of the "tool" framework relative to the "Friendliness theory" framework, as laid out in this exchange. This relates mostly to Eliezer's point #2 in the original post. We are both trying to make predictions about a technology for which many of the details are unknown, and at this point I don't see a clear way forward for resolving our disagreements, though I did make one suggestion in that thread.
  • Eliezer has also made two arguments (#1 and #4 in the original post) that appear to be of the form, "Even if the 'tool' approach is most promising, the Singularity Institute still represents a strong giving opportunity." A couple of thoughts on this point:
    • One reason I find the "tool" approach relevant in the context of SI is that it resembles what I see as the traditional approach to software development. My view is that it is likely to be both safer and more efficient for developing AGI than the "Friendliness theory" approach. If this is the case, it seems that the safety of AGI will largely be a function of the competence and care with which its developers execute on the traditional approach to software development, and the potential value-added of a third-party team of "Friendliness specialists" is unclear.
    • That said, I recognize that SI has multiple conceptually possible paths to impact, including developing AGI itself and raising awareness of the risks of AGI. I believe that the more the case for SI revolves around activities like these rather than around developing "Friendliness theory," the higher the bar for SI's general impressiveness (as an organization and team) becomes; I will elaborate on this when I respond to Luke's response to me.
  • Regarding Eliezer's point #3 - I think this largely comes down to how strong one finds the argument for "tool A.I." I agree that one shouldn't expect SI to respond to every possible critique of its plans. But I think it's reasonable to expect it to anticipate and respond to the stronger possible critiques.
  • I'd also like to address two common objections to the "tool AI" framework that came up in comments, though neither of these objections appears to have been taken up in official SI responses.
    • Some have argued that the idea of "tool AI" is incoherent, or is not distinct from the idea of "Oracle AI," or is conceptually impossible. I believe these arguments to be incorrect, though my ability to formalize and clarify my intuitions on this point has been limited. For those interested in reading attempts to better clarify the concept of "tool AI" following my original post, I recommend jsalvatier's comments on the discussion post devoted to this topic as well as my exchange with Eliezer elsewhere on this thread.
    • Some have argued that "agents" are likely to be more efficient and powerful than "tools," since they are not bottlenecked by human input, and thus that the "tool" concept is unimportant. I anticipated this objection in my original post and expanded on my response in my exchange with Eliezer elsewhere on this thread. In a nutshell, I believe the "tool" framework is likely to be a faster and more efficient way of developing a capable and useful AGI than the sort of framework for which "Friendliness theory" would be relevant; and if it isn't, that the sort of work SI is doing on "Friendliness theory" is likely to be of little value. (Again, I recognize that SI has multiple conceptually possible paths to impact other than development of "Friendliness theory" and will address these in a future comment.)
Comment by holdenkarnofsky on Reply to Holden on 'Tool AI' · 2012-07-18T16:29:00.048Z · LW · GW

Thanks for the response. My thoughts at this point are that

  • We seem to have differing views of how to best do what you call "reference class tennis" and how useful it can be. I'll probably be writing about my views more in the future.
  • I find it plausible that AGI will have to follow a substantially different approach from "normal" software. But I'm not clear on the specifics of what SI believes those differences will be and why they point to the "proving safety/usefulness before running" approach over the "tool" approach.
  • We seem to have differing views of how frequently today's software can be made comprehensible via interfaces. For example, my intuition is that the people who worked on the Netflix Prize algorithm had good interfaces for understanding "why" it recommends what it does, and used these to refine it. I may further investigate this matter (casually, not as a high priority); on SI's end, it might be helpful (from my perspective) to provide detailed examples of existing algorithms for which the "tool" approach to development didn't work and something closer to "proving safety/usefulness up front" was necessary.
Comment by holdenkarnofsky on Reply to Holden on 'Tool AI' · 2012-07-18T02:35:33.709Z · LW · GW

Thanks for the response. To clarify, I'm not trying to point to the AIXI framework as a promising path; I'm trying to take advantage of the unusually high degree of formalization here in order to gain clarity on the feasibility and potential danger points of the "tool AI" approach.

It sounds to me like your two major issues with the framework I presented are (to summarize):

(1) There is a sense in which AIXI predictions must be reducible to predictions about the limited set of inputs it can "observe directly" (what you call its "sense data").

(2) Computers model the world in ways that can be unrecognizable to humans; it may be difficult to create interfaces that allow humans to understand the implicit assumptions and predictions in their models.

I don't claim that these problems are trivial to deal with. And stated as you state them, they sound abstractly very difficult to deal with. However, it seems true - and worth noting - that "normal" software development has repeatedly dealt with them successfully. For example: Google Maps works with a limited set of inputs; Google Maps does not "think" like I do and I would not be able to look at a dump of its calculations and have any real sense for what it is doing; yet Google Maps does make intelligent predictions about the external universe (e.g., "following direction set X will get you from point A to point B in reasonable time"), and it also provides an interface (the "route map") that helps me understand its predictions and the implicit reasoning (e.g. "how, why, and with what other consequences direction set X will get me from point A to point B").

Difficult though it may be to overcome these challenges, my impression is that software developers have consistently - and successfully - chosen to take them on, building algorithms that can be "understood" via interfaces and iterated over - rather than trying to prove the safety and usefulness of their algorithms with pure theory before ever running them. Not only does the former method seem "safer" (in the sense that it is less likely to lead to putting software in production before its safety and usefulness has been established) but it seems a faster path to development as well.

It seems that you see a fundamental disconnect between how software development has traditionally worked and how it will have to work in order to result in AGI. But I don't understand your view of this disconnect well enough to see why it would lead to a discontinuation of the phenomenon I describe above. In short, traditional software development seems to have an easier (and faster and safer) time overcoming the challenges of the "tool" framework than overcoming the challenges of up-front theoretical proofs of safety/usefulness; why should we expect this to reverse in the case of AGI?

Comment by holdenkarnofsky on Reply to Holden on 'Tool AI' · 2012-07-05T16:18:16.305Z · LW · GW


I appreciate the thoughtful response. I plan to respond at greater length in the future, both to this post and to some other content posted by SI representatives and commenters. For now, I wanted to take a shot at clarifying the discussion of "tool-AI" by discussing AIXI. One of the the issues I've found with the debate over FAI in general is that I haven't seen much in the way of formal precision about the challenge of Friendliness (I recognize that I have also provided little formal precision, though I feel the burden of formalization is on SI here). It occurred to me that AIXI might provide a good opportunity to have a more precise discussion, if in fact it is believed to represent a case of "a rare exception who specified his AGI in such unambiguous mathematical terms that he actually succeeded at realizing, after some discussion with SIAI personnel, that AIXI would kill off its users and seize control of its reward button."

So here's my characterization of how one might work toward a safe and useful version of AIXI, using the "tool-AI" framework, if one could in fact develop an efficient enough approximation of AIXI to qualify as a powerful AGI. Of course, this is just a rough outline of what I have in mind, but hopefully it adds some clarity to the discussion.

A. Write a program that

  1. Computes an optimal policy, using some implementation of equation (20) on page 22 of
  2. "Prints" the policy in a human-readable format (using some fixed algorithm for "printing" that is not driven by a utility function)
  3. Provides tools for answering user questions about the policy, i.e., "What will be its effect on ___?" (using some fixed algorithm for answering user questions that makes use of AIXI's probability function, and is not driven by a utility function)
  4. Does not contain any procedures for "implementing" the policy, only for displaying it and its implications in human-readable form

B. Run the program; examine its output using the tools described above (#2 and #3); if, upon such examination, the policy appears potentially destructive, continue tweaking the program (for example, by tweaking the utility it is selecting a policy to maximize) until the policy appears safe and desirable

C. Implement the policy using tools other than AIXI agent

D. Repeat (B) and (C) until one has confidence that the AIXI agent reliably produces safe and desirable policies, at which point more automation may be called for

My claim is that this approach would be superior to that of trying to develop "Friendliness theory" in advance of having any working AGI, because it would allow experiment- rather than theory-based development. Eliezer, I'm interested in your thoughts about my claim. Do you agree? If not, where is our disagreement?

Comment by holdenkarnofsky on Thoughts on the Singularity Institute (SI) · 2012-05-10T16:12:28.639Z · LW · GW

Thanks for pointing this out. The links now work, though only from the permalink version of the page (not from the list of new posts).

Comment by holdenkarnofsky on Singularity Institute $100,000 end-of-year fundraiser only 20% filled so far · 2012-01-17T20:04:23.623Z · LW · GW

Carl, it looks like we have a pretty substantial disagreement about key properties of the appropriate prior distribution over expected value of one's actions.

I am not sure whether you are literally endorsing a particular distribution (I am not sure whether "Solomonoff complexity prior" is sufficiently well-defined or, if so, whether you are endorsing that or a varied/adjusted version). I myself have not endorsed a particular distribution. So it seems like the right way to resolve our disagreement is for at least one of us to be more specific about what properties are core to our argument and why we believe any reasonable prior ought to have these properties. I'm not sure when I will be able to do this on my end and will likely contact you by email when I do.

What I do not agree with is the implication that my analysis is irrelevant to Pascal's Mugging. It may be irrelevant for people who endorse the sorts of priors you endorse. But not everyone agrees with you about what the proper prior looks like, and many people who are closer to me on what the appropriate prior looks like still seem unaware of the implications for Pascal's Mugging. If nothing else, my analysis highlights a relationship between one's prior distribution and Pascal's Mugging that I believe many others weren't aware of. Whether it is a decisive refutation of Pascal's Mugging is unresolved (and depends on the disagreement I refer to above).

Comment by holdenkarnofsky on Singularity Institute $100,000 end-of-year fundraiser only 20% filled so far · 2011-12-29T00:37:24.706Z · LW · GW

Louie, I think you're mischaracterizing these posts and their implications. The argument is much closer to "extraordinary claims require extraordinary evidence" than it is to "extraordinary claims should simply be disregarded." And I have outlined (in the conversation with SIAI) ways in which I believe SIAI could generate the evidence needed for me to put greater weight on its claims.

I wrote more in my comment followup on the first post about why an aversion to arguments that seem similar to "Pascal's Mugging" does not entail an aversion to supporting x-risk charities. (As mentioned in that comment, it appears that important SIAI staff share such an aversion, whether or not they agree with my formal defense of it.)

I also think the message of these posts is consistent with the best available models of how the world works - it isn't just about trying to set incentives. That's probably a conversation for another time - there seems to be a lot of confusion on these posts (especially the second) and I will probably post some clarification at a later date.

Comment by holdenkarnofsky on Singularity Institute $100,000 end-of-year fundraiser only 20% filled so far · 2011-12-28T15:04:16.252Z · LW · GW

Hi, here are the details of whom I spoke with and why:

  • I originally emailed Michael Vassar, letting him know I was going to be in the Bay Area and asking whether there was anyone appropriate for me to meet with. He set me up with Jasen Murray.
  • Justin Shovelain and an SIAI donor were also present when I spoke with Jasen. There may have been one or two others; I don't recall.
  • After we met, I sent the notes to Jasen for review. He sent back comments and also asked me to run it by Amy Willey and Michael Vassar, who each provided some corrections via email that I incorporated.

A couple of other comments:

  • If SIAI wants to set up another room for more funding discussion, I'd be happy to do that and to post new notes.
  • In general, we're always happy to post corrections or updates on any content we post, including how that content is framed and presented. The best way to get our attention is to email us at

And a tangential comment/question for Louie: I do not understand why you link to my two LW posts using the anchor text you use. These posts are not about GiveWell's process. They both argue that standard Bayesian inference indicates against the literal use of non-robust expected value estimates, particularly in "Pascal's Mugging" type scenarios. Michael Vassar's response to the first of these was that I was attacking a straw man. There are unresolved disagreements about some of the specific modeling assumptions and implications of these posts, but I don't see any way in which they imply a "limited process" or "blinding to the possibility of SIAI's being a good giving opportunity." I do agree that SIAI hasn't been a fit for our standard process (and is more suited to GiveWell Labs) but I don't see anything in these posts that illustrates that - what do you have in mind here?

Comment by holdenkarnofsky on Maximizing Cost-effectiveness via Critical Inquiry · 2011-11-12T21:31:44.429Z · LW · GW

A few quick notes:

  • As I wrote in my response to Carl on The GiveWell Blog, the conceptual content of this post does not rely on the assumption that the value of donations (as measured in something like "lives saved" or "DALYs saved") is normally distributed. In particular, a lognormal distribution fits easily into the above framework. .

  • I recognize that my model doesn't perfectly describe reality, especially for edge cases. However, I think it is more sophisticated than any model I know of that contradicts its big-picture conceptual conclusions (e.g., by implying "the higher your back-of-the-envelope [extremely error-prone] expected-value calculation, the necessarily higher your posterior expected-value estimate") and that further sophistication would likely leave the big-picture conceptual conclusions in place.

  • JGWeissman is correct that I meant "maximum" when I said "inflection point."

Comment by holdenkarnofsky on Why We Can't Take Expected Value Estimates Literally (Even When They're Unbiased) · 2011-08-29T16:31:00.270Z · LW · GW

Hello all,

Thanks for the thoughtful comments. Without responding to all threads, I'd like to address a few of the themes that came up. FYI, there are also interesting discussions of this post at The GiveWell Blog , Overcoming Bias , and Quomodocumque (the latter includes Terence Tao's thoughts on "Pascal's Mugging").

On what I'm arguing. There seems to be confusion on which of the following I am arguing:

(1) The conceptual idea of maximizing expected value is problematic.

(2) Explicit estimates of expected value are problematic and can't be taken literally.

(3) Explicit estimates of expected value are problematic/can't be taken literally when they don't include a Bayesian adjustment of the kind outlined in my post.

As several have noted, I do not argue (1). I do aim to give with the aim of maximizing expected good accomplished, and in particular I consider myself risk-neutral in giving.

I strongly endorse (3) and there doesn't seem to be disagreement on this point.

I endorse (2) as well, though less strongly than I endorse (3). I am open to the idea of formally performing a Bayesian adjustment, and if this formalization is well done enough, taking the adjusted expected-value estimate literally. However,

  • I have examined a lot of expected-value estimates relevant to giving, including those done by the DCP2 , Copenhagen Consensus , and Poverty Action Lab , and have never once seen a formalized adjustment of this kind.

  • I believe that often - particularly in the domains discussed here - formalizing such an adjustment in a reasonable way is simply not feasible and that using intuition is superior. This is argued briefly in this post, and Dario Amodei and Jonah Sinick have an excellent exchange further exploring this idea at the GiveWell Blog.

  • If you disagree with the above point, and feel that such adjustments ought to be done formally, then you do disagree with a substantial part of my post; however, you ought to find the remainder of the post more consequential than I do, since it implies substantial room for improvement in the most prominent cost-effectiveness estimates (and perhaps all cost-effectiveness estimates) in the domains under discussion.

All of the above applies to expected-value calculations that take relatively large amounts of guesswork, such as in the domain of giving. There are expected-value estimates that I feel are precise/robust enough to take literally.

Is it reasonable to model existential risk reduction and/or "Pascal's Mugging" using log-/normal distributions? Several have pointed out that existential risk reduction and "Pascal's Mugging" seem to involve "either-or" scenarios that aren't well approximated by log-/normal distributions. I wish to emphasize that I'm focused on the prior over expected value of one's actions and on the distribution of error in one's expected-value estimate. (The latter is a fuzzy concept that may be best formalized with the aid of concepts such as imprecise probability. In the scenarios under discussion, one often must estimate the probability of catastrophe essentially by making a wild guess with a wide confidence interval, leaving wide room for "estimate error" around the expected-value calculation.) Bayesian adjustments to expected-value estimates of actions, in this framework, are smaller (all else equal) for well-modeled and well-understood "either-or" scenarios than for poorly-modeled and poorly-understood "either-or" scenarios.

For both the prior and for the "estimate error," I think the log-/normal distribution can be a reasonable approximation, especially when considering the uncertainty around the impact of one's actions on the probability of catastrophe.

The basic framework of this post still applies, and many of its conclusions may as well, even when other types of probability distributions are assumed.

My views on existential risk reduction are outside the scope of this post. The only mention I make of existential risk reduction is to critique the argument that "charities working on reducing the risk of sudden human extinction must be the best ones to support, since the value of saving the human race is so high that 'any imaginable probability of success' would lead to a higher expected value for these charities than for others." Note that Eliezer Yudkowsky and Michael Vassar also appear to disapprove of this argument, so it seems clear that disputing this argument is not the same as arguing against existential risk reduction charities.

For the past few years we have considered catastrophic risk reduction charities to be lower on GiveWell's priority list for investigation than developing-world aid charities, but still relatively high on the list in the scheme of things. I've recently started investigating these causes a bit more, starting with SIAI (see LW posts on my discussion with SIAI representatives and my exchange with Jaan Tallinn). It's plausible to me that asteroid risk reduction is a promising area, but I haven't looked into it enough (yet) to comment more on that.

My informal objections to what I term EEV. Several have criticized the section of my post giving informal objections to what I term the EEV approach (by which I meant explicitly estimating expected value using a rough calculation and not performing a Bayesian adjustment). This section was intended only as a very rough sketch of what unnerves me about EEV; there doesn't seem to be much dispute over the more formal argument I made against EEV; thus, I don't plan on responding to critiques of this section.

Comment by holdenkarnofsky on Efficient Charity: Do Unto Others... · 2011-01-03T21:43:55.866Z · LW · GW

Patrissimo, fair enough. I was thinking that voters can't vote with the same degree of knowledge of the existing situation that they can have with blood donations. Arguments over TDT certainly seem more relevant to voting than to blood donations. But you are right that voters have lots of relevant information about the likely distribution of votes that can be productively factored into their decisions regardless of the TDT debate. Glad to hear you're a fan of GiveWell.

Comment by holdenkarnofsky on Efficient Charity: Do Unto Others... · 2010-12-29T17:01:06.407Z · LW · GW

This is Holden Karnofsky, the co-Executive Director of GiveWell, which is referenced in the top-level article and elsewhere on this thread.

I think there is an important difference between discussing the marginal impact of a blood donation and the marginal impact of a vote. When it comes to blood donations, it is possible for everyone to simultaneously follow the rule: "Give blood only when the supply of donations is low enough that an additional donation would have high expected impact", with a reasonable outcome. It is not possible for everyone to behave this way in elections: no voter is able to consider the existing distribution of votes before casting their own.

I am only casually familiar with TDT/UDT, but it seems to me that that "Give blood only when the supply of donations is low enough that an additional donation would have high expected impact" should get about the same amount of credit under TDT/UDT as giving blood, and thus the extra impact of actually giving blood (as opposed to following that rule) is small regardless of what decision theory one is using.

I bring this up because the discussion of marginal blood donations is parallel to analysis GiveWell often does of the marginal impact of donations. We do everything we can to understand the marginal (not average) impact of a donation and recommend organizations on this basis, and we believe this is a very important and unique element of what we offer (more on this issue). We try to push donors to underfunded charities and away from overfunded ones, and I do not think the validity of this depends on any controversial (even controversial-within-Less-Wrong) view on decision theory, though I am open to arguments that it does.

Comment by holdenkarnofsky on Efficient Charity: Do Unto Others... · 2010-12-29T16:39:24.672Z · LW · GW

This is Holden Karnofsky, the co-Executive Director of GiveWell. As a frequent Less Wrong reader, I'm really glad to see the thoughtful discussion here. Thanks to Yvain for calling attention both to GiveWell and to the general topic of effective giving.

First off, much of this content overlaps with our own, so people interested in this thread might also find the following links interesting:

I'm mostly posting to clarify a few things regarding the concerns that have been raised about GiveWell (by aeschenkarnos).

  • We regret the astroturfing that aeschenkarnos brought up. This incident is disclosed, along with other mistakes we've made, on our shortcomings list , which is accessible via a top-level link on our navigation bar.
  • Regarding the split between grants to charities and funds spent on our own operations:
    • Early in our existence, we relied on making grants of our own to charities. We weren't able to point them to any benefits that would come from our recommendations (since we were new and had no track record of influencing donations), so rather than inviting them to be reviewed, we invited them to apply for grants (subject to certain conditions such as public disclosure of application materials). Grantmaking is no longer important to our process and we no longer solicit donations to be regranted, though we still occasionally receive them. That explains why the % of our funds spent on grants has fallen a lot, though it hasn't hit zero.
    • At this point, we actively solicit donations to GiveWell only when dealing with institutional funders or with people who have a relationship with us. When dealing with the general public, we put the solicitation on behalf of recommended charities - rather than ourselves - front and center. Our top charities page, linked prominently from our front page and navigation bar and in other places throughout the site, links to "donate" pages for top charities ( here's the one for our top-rated charity VillageReach ) that allow us to track donations, but otherwise take no part in the donation process (the money does not touch our bank account). These "donate" pages also are linked from charity reviews. The only way to get to the "Donate to GiveWell" page is under "About GiveWell." If donors make a considered decision to support us rather than our top charities, we want them to be able to do so, but our site is designed to push the casual user to our top charities.
    • In 2009 we tracked ~$1 million in donations to our top charities as as result of our research, while our own operating (non-grant) expenses were under $300k. We expect 2010 to have a higher "donations to top charities" figure on similar operating expenses. We are still new and hope the ratio will improve substantially over time.
    • We have a policy of regranting unrestricted funds if our reserves go above a certain level; we don't believe in building a massive endowment for ourselves. This is the only condition under which we regrant unrestricted funds. We don't want donors to fear that we might blindly pile up reserves without limit (we won't), but we don't want to get into all the details of our "Excess reserves" policy on the Donate page, so we went with the language: "we may use these funds for operating expenses or grants to charities, at our discretion."
    • Bottom line - grantmaking used to be an important part of what we do but it isn't now; the % of our funds spent on grants is not a meaningful figure.
  • Regarding Charity Navigator:
    • I believe Yvain is correct to say that Charity Navigator does not evaluate effectiveness (and admits this) and that GiveWell does. See also this recent New York Times article on planned changes at Charity Navigator and Charity Navigator's disclosure of the full details of its current methodology.
    • I agree with alexanderis that "number of charities rated" is higher for Charity Navigator primarily because its research is not as in-depth. I believe Charity Navigator would agree with this as well.
    • I believe that Charity Navigator has a significantly higher profile than GiveWell, overall, and know of no evidence suggesting otherwise. However, GiveWell does have a higher profile within certain communities, including Less Wrong. I attribute our higher profile on Less Wrong to specific individuals including Michael Vassar, Anna Salomon, Carl Shulman, Razib at GNXP, and multifoliaterose. I don't believe any of these individuals have plugged GiveWell in ignorance of Charity Navigator (in fact I have probably discussed the differences specifically with each of them).

We've worked to find the best, most cost-effective charities (in terms of actual impact per marginal dollar) and write up all the details of our analysis. We welcome more comments and questions about our work, whether here, on our blog, or via email.