AidGrade - GiveWell finally has some competition

post by Raemon · 2013-01-22T15:41:42.891Z · LW · GW · Legacy · 28 comments

Contents

28 comments

AidGrade is a new charity evaluator that looks to be comparable to GiveWell. Their primary difference is that they *only* focus on how charities compare along particular measured outcomes (such as school attendance, birthrate, chance of opening a business, malaria), without making any effort to compare between types of charities. (This includes interesting results like "Conditional Cash Transfers and Deworming are better at improving attendance rates than scholarships")

GiveWell also does this, but designs their site to direct people towards their top charities. This is better for people with don't have the time to do the (fairly complex) work of comparing charities across domains, but AidGrade aims to be better for people that just want the raw data and the ability to form their own conclusions.

I haven't looked it enough to compare the quality of the two organizations' work, but I'm glad we finally have another organization, to encourage some competition and dialog about different approaches.

This is a fun page to play around with to get a feel for what they do:
http://www.aidgrade.org/compare-programs-by-outcome

And this is a blog post outlining their differences with GiveWell:
http://www.aidgrade.org/uncategorized/some-friendly-concerns-with-givewell

28 comments

Comments sorted by top scores.

comment by paulfchristiano · 2013-01-22T18:42:54.367Z · LW(p) · GW(p)

I'm also glad to see competition, and I would not be surprised at all if their overviews of the evidence were stronger than GiveWell's. It would be nice to see what they are doing, and I don't trust their results too much without that. I guess there is a book where they describe the meta-analyses they've done, which I have not had a chance to see.

The comparison with GiveWell seems mostly unreasonable, and I think reflects somewhat badly. Most of the points are either mistaken or misleading, and I would be surprised if they could be made by someone writing in good faith. [Edit: apologies for the snarky tone and inaccurate claim!]

  • They suggest GiveWell doesn't understand causal attribution because Holden lists instrumental variables at the top of an inventory of methods of causal attribution. However, the list is in alphabetical order, and Holden says specifically that they rarely see compelling studies of this form.
  • They criticize GiveWell for vote counting and provide a detailed description of what vote counting is. They link to GiveWell's analysis of microfinance as their only example, and suggest it makes elementary statistical errors. The procedure GiveWell is following is to look at a variety of studies with methodological failures, point out that the positive results are in the studies with the largest methodological failures, and that there is strong evidence that those failures will exaggerate the positive impact of microfinance. It is hard to mistake this for vote counting in good faith. A study which shows that eliminating methodological flaws reduces measured positive impacts does constitute evidence that undermines other methodologically flawed studies which find a positive effect. This is not the same as "accepting the null."
  • AidGrade posts the comparison on their blog but doesn't take responsibility for it (disclaiming: these are the views of the author) or allow responses there.

Eva Vivalt thinks that making comparisons between outcome measures is not the place of a charity evaluator, and faults GiveWell for being willing to do so. No argument is provided for this, nor any response to the arguments GiveWell has given in favor of making such judgments.

This seems like a common and defensible position, but as an altruist concerned with aggregate welfare it doesn't make too much sense for me. Yes, there is value in producing the raw estimates of effectiveness on respective outcome measures (which GiveWell does as well), but encouraging discussion about what outcome measures are important is also a valuable public good, and certainly not an active disservice.

@Raemon: saying this is better for "people with choice paralysis or who don't have any idea how to evaluate different types of outcomes" seems to be missing the point. It is a significant, largely empirical challenge to determine which intermediate outcome measures most matter for the things we ultimately care about. Whether or not GiveWell does that passably, it is clearly something which needs to be done and which individual donors are not well-equipped to do.

The two valid points of criticism Eva raises:

  • GiveWell is willing to accept data of the form "these two graphs look pretty similar" when common-sense reasoning suggests that similarity reflects a causal influence. At best, GiveWell's willingness to use such data requires donors to have more trust in them. At worst, it causes GiveWell to make mistakes. Such data was misleading once in the past, indeed in the example GiveWell cites of this approach. That said, a formal statistical analysis wouldn't have helped at all if GiveWell was willing to accept difference-of-difference estimators or matching [see the next point]. Overall this seems like a valid criticism of GiveWell, although I think GiveWell's position is defensible and has been defended, while this criticism includes no argument other than an appeal to incredulity. Indeed, even the incredulity is incorrectly described, as it applies verbatim to difference-of-difference estimators and matching as well, which economists do accept and which the author chastises GiveWell for not explicitly listing in their inventory.
  • GiveWell fails to note difference-of-differences or matching in their list of methods of causal attribution. This is technically valid but a bit petty, given that GiveWell does in fact use such estimators when available. Making an error of omission in an expository blog post probably does not license the conclusion "[GiveWell is] not in a good position to evaluate studies that did not use randomization." [Edit: as Eva points out, GiveWell is in fact in a worse position to evaluate studies that don't use randomization, though I don't think the evidence presented is very relevant to this claim and I think Eva overstates it significantly.]
Replies from: Eva, ESRogs, feanor1600, Douglas_Knight
comment by Eva · 2013-01-23T01:55:31.278Z · LW(p) · GW(p)

Eva Vivalt here. Was sent the link to this discussion, happy to discuss, though I won't have time to really go back and forth because as you can imagine there is a big time crunch and replying to people on the internet who want more things to be up is not an effective way of getting things up! :) (That may not be your main concern but it is mine.)

Ironically, I think the points characterizing the post as mistaken or misleading are, well, mistaken or misleading. Responding to the bullets in turn:

  • "They suggest GiveWell doesn't understand causal attribution because Holden lists instrumental variables at the top of an inventory of methods of causal attribution." Not really. That's only the tiniest of parts of the issue, and if it's alphabetical, remove that part and the majority still holds.
  • There is value to literature reviews, but at the end of the day you're usually going to want to do some quantitative analysis.
  • You can indeed reply to the post, you just need to log in to comment, to avoid the 8,000+ spam comments one usually gets. If something doesn't work, let me know and I'll pass that on.

The argument isn't that making comparisons between outcome measures is not the place of a charity evaluator, it is that if you are going down that route you had better have a good basis for your comparisons. I would agree that discussing what outcome measures are important is valuable. That's precisely my point - imho, GiveWell is not "encouraging discussion about what outcome measures are important" but rather discouraging it by providing recommendations that don't acknowledge that there is an issue here. I'm told they used to highlight this more and shifted because people wanted a simple number, but that's not obvious to the casual reader.

Not sure how this adds: "Also, saying this is better for 'people with choice paralysis or who don't have any idea how to evaluate different types of outcomes' seems to be missing the point. It is a significant, largely empirical challenge to determine which intermediate outcome measures most matter for the things we ultimately care about. Whether or not GiveWell does that passably, it is clearly something which needs to be done and which individual donors are not well-equipped to do." For the record, as it may not be clear, "people with choice paralysis or who don't have any idea how to evaluate different types of outcomes" aren't my words. Agree that the issue is important, disagree they do a good job at it, and therein lies the rub.

My impression was that they themselves would agree they are not in a great position to evaluate studies that did not use randomization.

Replies from: paulfchristiano, paulfchristiano
comment by paulfchristiano · 2013-01-23T06:53:16.224Z · LW(p) · GW(p)

I think getting things up is the right priority! (And I'm glad you are doing this and don't mean to discourage you at all, though I was annoyed by this post.)

Sorry I accused you of closing comments. Trying to block spam comments is completely understandable, though the fact that the comments section reads "Comments are closed" is hopefully understandably confusing :) Edit: actually I just logged in, and still can't comment.

I agree that there is room for quantitative analysis, and I agree that you are better positioned to provide that than GiveWell (I made a brief concession to this at the start, which perhaps should have been longer but for my unjustified ill humor). I agree that GiveWell lacks staff with relevant skills, but I think the evidence you cite is weak (mostly errors of omission in an expository blog post) and you overstate the case.

I think that in cases like the microfinance meta-analysis, where there are in fact big confounds, GiveWell's take is more reliable than most meta-analyses (and without seeing your meta-analysis, I would default to trusting GiveWell over you in this case). I think disparaging their approach as vote counting is misleading and places too much confidence in the methodology of your metaanalyses. I'm prepared to be surprised, but the empirical track record of meta-analyses is simply not that good.

Yes, the comment about "choice paralysis" was a response to Raemon. I forget that in a different context that may look like misattribution, sorry about that.

GiveWell makes recommendations. It seems like at the end of the day, people need to donate, and GiveWell's judgment about how to weigh intermediates is better than the average donor's. So it seems like they are making the right call there (it's not coincidental this is what donors wanted, since they understand that GiveWell's judgment on that question is better than their own).

GiveWell also discusses the various intermediates that are being weighed against each other, and their reasoning with respect to those intermediates. I do not think their discussion is great nor their positions solidly justified, and I disagree with them on many points. But I don't see anyone else anywhere who is trying to have that discussion, and it seems like GiveWell is actively encouraging rather than discouraging it (to wit, the community around GiveWell seems to be one of the few places to find serious, reasonable discussion about this issue).

I basically stand by my criticisms, though I do apologize about my tone. I considered editing my original message but think it's better to let it stand. I'll make a more respectable comment at the original. I think the world is probably better for AidGrade's existence, I agree there are gaps in GiveWell's coverage that can be filled (and core services that could be profitably replicated) and I hope that both groups can do better than they would in isolation. I'll be more civil going forward--cheers!

Replies from: Eva, Raemon
comment by Eva · 2013-01-23T15:36:56.358Z · LW(p) · GW(p)

First, thanks to paulfchristiano for the moderation. I'm also trying to be moderate, but it's sometimes hard to gauge one's own tone on the internet.

Now, apologies for replying to numerous points from different people in one post, but I would feel strange posting all over the place here and this is probably my last post. It would be helpful if people have more questions to send them directly and I can try to address them on the blog as multitasking (as well as so that more people can benefit from the answers, since as good as Less Wrong is, I doubt it would be the most appropriate long-term home for the main questions people have about AidGrade): http://www.aidgrade.org/blog.

Re: ygert's "I don't care about how many people are dying of malaria. I just don't. What I do care about is people dying, or suffering, of anything": We're trying to build up to this, but not there yet. Hang on, please. GiveWell in 2013 is also much better than GiveWell 1.0 was.

Just to quickly add: I've also separately been informed GiveWell's rationale for simplifying was because donors themselves seemed to focus on global health, with reference to section 2 of http://blog.givewell.org/2011/02/04/givewells-annual-self-evaluation-and-plan-a-big-picture-change-in-priorities/. My gut says that if they had picked a different organization as their #1 rated organization, they would see less emphasis on global health, but I can understand wanting to focus on what their main donors supported. It's a fair point -- if QALYs are what people want, that's what people want. But do people really put no weight on education, etc.? If you think to the big philosophers, you don't think of Nussbaum or Singer or whoever else saying okay, QALYs are all that matter. I'm not saying who's right here, but I do think there's a greater diversity of opinion than is being reflected here; the popularity of QALYs might in part be due to the fact we have a measure for it (as opposed to e.g. something that aggregates education (EALYs?) or aggregates across all fields or is harder to measure).

Re: meta-analysis -- first, a meta-analysis tool should in principle be weakly better than (at least as good as) looking at any one study. (See: http://www.aidgrade.org/faq/how-will-you-provide-information-on-context.) An advantage of gathering all these data and coding up different characteristics of studies is that it allows easier filtering of studies later on to allow people to look at results in different settings. If results widely vary by setting, you can see that, too. Second, all the things that go into a literature review of a topic also go into a meta-analysis, which is more like a superset. So if you don't think a paper was particularly good for whatever reason you can flag that and exclude it from the meta-analysis. We have some quality measures, not that you can tell that from what's currently online unfortunately.

My overall impression is that since GiveWell has quite rightly been supported by pretty much everyone who cares about aid and data, it's particularly hard to say anything that's different. Hardly anyone has any tribal affiliations to AidGrade yet, relatively speaking, there's the unknown, etc. But while I feel the concern (and excitement) here has been from people considering AidGrade as a competitor, I would like to point out each stands to benefit from the other as well. (Actually, now I see that paulfchristiano makes that point as well.)

And on that note, I'll try to bow out / carry on the conversation elsewhere.

comment by Raemon · 2013-01-23T14:53:33.177Z · LW(p) · GW(p)

Apologies to everyone involved for the "Choice paralysis" line. It was (I thought a bit more obviously), an exaggeration. To be clear: I myself rely on GiveWell, not to do identify the best charity for me, but to establish a lower bound on what "the most effective charity" might be, which I can compare to my best efforts at reviewing more-difficult-to-evaluate-but-probably-higher-impact charities (like, say, CFAR). And this is neither "choice paralysis" nor "not having any idea what to do." I'll change the OP to be less flippant.

I want to express my thanks to Eva for starting this project, and wish her luck and will getting more research done and the website updated with more content in the days/weeks/months to come.

comment by paulfchristiano · 2013-01-23T07:22:28.713Z · LW(p) · GW(p)

On your 'about' page as well as in the linked article, you criticize GiveWell for vote counting. The example you cite of this is their microfinance review. I don't know how solid this review was, but there are at least plausible reasons for treating the null results as negative evidence in this case, and I would bet on GiveWell's analysis over your meta-analysis but not confidently.

Do you stand by the claim that GiveWell's analysis is badly flawed and that donors should trust your meta-analysis of microfinance instead? If so, I'll look into the case more closely and update my views accordingly.

comment by ESRogs · 2013-01-22T20:48:20.184Z · LW(p) · GW(p)

I was also surprised by the tone of that post. Thank you for this excellent analysis.

comment by feanor1600 · 2013-01-28T13:59:44.922Z · LW(p) · GW(p)

From the same page, "Instrumental variables are rarely used and have generally become viewed with suspicion; their heyday was the 1980s" This is simply not true, at least within economics. Look at any recent econometrics textbook, or search for "instrumental variables" in EconLit and notice how there are more hits every year between 1970 and now.

comment by Douglas_Knight · 2013-01-23T06:22:46.698Z · LW(p) · GW(p)

Do not attribute to malice...

You are badly calibrated with respect to "good faith."

Replies from: paulfchristiano
comment by paulfchristiano · 2013-01-23T06:34:28.224Z · LW(p) · GW(p)

You're right of course (depending on how narrowly one construes "good faith").

Replies from: wedrifid
comment by wedrifid · 2013-01-23T07:45:22.943Z · LW(p) · GW(p)

:s/ly c/ly one c/

comment by jimrandomh · 2013-01-23T01:18:41.483Z · LW(p) · GW(p)

This is excellent! I haven't judged the quality of their evaluations or how much overlap there is between them and Givewell, but all that aside, this changes the naive-donor pitch from

"Before donating, you should find out which is the most effective charity by checking a charity evaluator such as GiveWell"

to

"Before donating, you should find out which is the most effective charity by checking a charity evaluator such as GiveWell or AidGrade"

That is, having two credible charity effectiveness evaluators makes it possible to pitch the idea of charity effectiveness evaluation without having to also pitch for a specific organization, which implicitly validates the idea and makes it less political.

Replies from: Qiaochu_Yuan
comment by Qiaochu_Yuan · 2013-01-23T09:59:50.169Z · LW(p) · GW(p)

Good point! But having too many charity effectiveness evaluators might be bad ("who evaluates the charity evaluators?"). Not that I think this is likely to be a problem.

Replies from: None, EricHerboso
comment by [deleted] · 2013-01-23T18:18:20.587Z · LW(p) · GW(p)

GiveToGiveWellWell?

comment by EricHerboso · 2013-01-24T19:15:03.540Z · LW(p) · GW(p)

Case in point: Charity Navigator, which places unreasonable importance on irrelevant statistics like administrative overhead. There are already charity effectiveness evaluators out there that are doing counter-productive work.

Personally, I think adding another good charity evaluator to the mix as competition to GiveWell/Giving What We Can is important to the overall health of the optimal philanthropy movement.

comment by Oscar_Cunningham · 2013-01-22T18:58:12.682Z · LW(p) · GW(p)

I knew this would happen. Now I need a charity to tell me which of GiveWell and AidGrade is most effective!

Replies from: Pablo_Stafforini, Kawoomba
comment by Pablo (Pablo_Stafforini) · 2013-01-23T03:57:18.773Z · LW(p) · GW(p)

AirGrade only compares "like with like", so it could only assess its own performance against GiveWell's. Because GiveWell is not so constrained, it could in principle examine how either of these meta-charities fare not just against each other, but against top first-order charities such as Against Malaria Foundation.

comment by Kawoomba · 2013-01-22T19:41:35.807Z · LW(p) · GW(p)

Race to the top of the meta mountain.

Replies from: DaFranker
comment by DaFranker · 2013-01-22T19:50:39.894Z · LW(p) · GW(p)

Ferengi Rule of Acquisition #22: "A wise man can hear profit in the wind."

comment by ygert · 2013-01-23T09:07:21.894Z · LW(p) · GW(p)

It is good that there are more organizations in this important area. However, it seems very strange which outcomes they list.

Let's put it this way: I don't care about how many people are dying of malaria. I just don't. What I do care about is people dying, or suffering, of anything. That is why I find the attitude of AidGrade to be (almost) completely useless. The outcome I care about is maximizing QALYs, or maybe some other similar measure, and I actually don't care about the listed outcomes at all, except for as much as optimizing on them may help people not suffer and die. Basically, AidGrade tries to help with our instrumental goals, and that is well and fine, but in the end what we are trying to optimize are our terminal goals, and AidGrade doesn't help at that at all.

Replies from: EricHerboso
comment by EricHerboso · 2013-01-24T17:33:15.380Z · LW(p) · GW(p)

I agree with the spirit of this comment, but I think you are perhaps undervaluing the usefulness of helping with instrumental goals.

I am a huge fan of GiveWell/Giving What We Can, but one of the problems that many outsiders have with them is that they seem to have already made subjective value judgments on which things are more important. Remember that not everyone is into consequentialist ethics, and some find problems just with the concept of using QALYs.

Such people, when they first decide to start comparing charities, will not look at GiveWell/GWWC. They will look at something atrocious, like Charity Navigator. They will actually prefer Charity Navigator, since CN doesn't introduce subjective value judgments, but just ranks by unimportant yet objective stuff like overhead costs.

Though I've only just browsed their site, I view AidGrade as a potential way to reach those people. The people who want straight numbers. People who maybe aren't utilitarians, but recognize anyway that saving more is better than saving less, and so would use AidGrade to direct their funding to a better charity within whatever category they were going to donate to anyway. These people may not be swayed by traditional optimal philanthropy groups' arguments on mosquito nets over hiv drugs. But by listening to AidGrade, perhaps they will at least redirect their funding from bad charities to better charities within whatever category they choose.

comment by DanielLC · 2013-01-22T18:55:45.163Z · LW(p) · GW(p)

There's also Giving What We Can.

Replies from: Raemon, Pablo_Stafforini
comment by Raemon · 2013-01-22T19:22:35.578Z · LW(p) · GW(p)

They also provide a useful function, but so far, for the most part they rely upon GiveWell recommendations.

Replies from: EricHerboso
comment by EricHerboso · 2013-01-23T00:55:43.995Z · LW(p) · GW(p)

That speaks to GWWC's favor, I think. It would be odd for them to not take into account research done by GiveWell.

Remember that they don't agree on everything (e.g., cash transfers). When they do agree, I take it as evidence that GWWC has looked into GiveWell's recommendation and found it to be a good analysis. I don't really view it as parroting, which your comment might unintentionally imply.

comment by Pablo (Pablo_Stafforini) · 2013-01-23T03:56:41.608Z · LW(p) · GW(p)

There's also Effective Animal Activism, but although both GiveWell and EAA are meta-charities, there is no overlap: one focuses on human charities, whereas the other on animal organizations.

comment by juliawise · 2013-01-26T14:39:54.333Z · LW(p) · GW(p)

I'm trying to figure out how to use this site. It seems like I pick some metrics I care about - maybe height-for-age for starters. So it shows me that deworming did better than school meals at increasing height-for-age. But I don't know what either of those interventions cost. So if I have $100 that I want to use to increase height-for-age, I'm still not sure which intervention gets me more improvement for my money.

I'm not sure how to pick which metrics to care about - is height-for-age more important than weight-for-age? Neonatal deaths? Labor force participation? I guess one answer is that nobody is all that sure about which metrics to care about, and this is an approach that gives donors more data on specific interventions if they have different priorities than, say, GiveWell's.

The "portfolio" part isn't up yet, but I assume it will eventually connect me to some organizations that do those various interventions. I'll be interested to see which organizations they recommend.

Replies from: Raemon
comment by Raemon · 2013-01-26T17:09:18.530Z · LW(p) · GW(p)

Their point (I assume) is that any kind of comparison between metrics invites some level of subjectivity, and they want to make the raw data as easy to work with as possible. I agree with this sentiment (insofar as there should be at least one organization doing that), but I agree that their presentation of the raw data leaves a lot out in terms of what you actually get for your money and which organizations are good at it.

I'm going to wait a few months for them to flesh out their site before I criticize them too heavily.

Replies from: juliawise
comment by juliawise · 2013-01-27T03:06:11.394Z · LW(p) · GW(p)

Fair enough. It does seem to have potential, and we're a hard crowd to please.