The Wicked Problem Experience

holdenkarnofsky

The Wicked Problem Experience

post by HoldenKarnofsky · 2022-03-02T17:50:18.621Z · LW · GW · 6 comments

  Flashback to 2007 GiveWell
    Hitting a wall.
  What’s going wrong here?
  Footnotes
None
6 comments

I’ve spent a lot of my career working on wicked problems: problems that are vaguely defined, where there’s no clear goal for exactly what I’m trying to do or how I’ll know when or whether I’ve done it.

In particular, minimal-trust investigations - trying to understand some topic or argument myself (what charity to donate to, whether civilization is declining, whether AI could make this the most important century of all time for humanity), with little reliance on what “the experts” think - tend to have this “wicked” quality:

I could spend my whole life learning about any subtopic of a subtopic of a subtopic, so learning about a topic is often mostly about deciding how deep I want to go (and what to skip) on each branch.
There aren’t any stable rules for how to make that kind of decision, and I’m constantly changing my mind about what the goal and scope of the project even is.

This piece will narrate an example of what it’s like to work on this kind of problem, and why I say it is “hard, taxing, exhausting and a bit of a mental health gauntlet.”

My example is from the 2007 edition of GiveWell. It’s an adaptation from a private doc that some other people who work on wicked problems have found cathartic and validating.

It’s particularly focused on what I call the hypothesis rearticulation part of investigating a topic (steps 3 and 6 in my learning by writing process), which is when:

I have a hypothesis about the topic I’m investigating.
I realize it doesn’t seem right, and I need a new one.
Most of the things I can come up with are either “too strong” (it would take too much work to examine them satisfyingly) or “too weak” (they just aren’t that interesting/worth investigating).
I need to navigate that balance and find a new hypothesis that is (a) coherent; (b) important if true; (c) maybe something I can argue for.

After this piece tries to give a sense for what the challenge is like, a future piece will give accumulated tips for navigating it.

Flashback to 2007 GiveWell

Context for those unfamiliar with GiveWell:

In 2007, I co-founded (with Elie Hassenfeld) an organization that recommends evidence-backed, cost-effective charities to help people do as much good as possible with their donations.
When we started the project, we initially asked charities to apply for $25,000 grants, and to agree (as part of the process) that we could publish their application materials. This was our strategy for trying to find charities that could provide evidence about how much they were helping people (per dollar).
This example is from after we had collected information from charities and determined which one we wanted to rank #1, and were now trying to write it all up for our website. Since then, GiveWell has evolved a great deal and is much better than the 2007 edition I’ll be describing here.
(This example is reconstructed from my memory a long time later, so it’s probably not literally accurate.)

Initial “too strong” hypothesis. Elie (my co-founder at GiveWell) and I met this morning and I was like “I’m going to write a page explaining what GiveWell’s recommendations are and aren’t. Basically, they aren’t trying to evaluate every charity in the world. Instead they’re saying which ones are the most cost-effective.” He nodded and was like “Yeah, that’s cool and helpful, write it.”

Now I’m sitting at my computer trying to write down what I just said in a way that an outsider can read - the “hypothesis articulation” phase.

I write, “GiveWell doesn’t evaluate every charity in the world. Our goal is to save the most lives possible per dollar, not to create a complete ranking or catalogue of charities. Accordingly, our research is oriented around identifying the single charity that can save the most lives per dollar spent,”

Hmm. Did we identify the “single charity that can save the most lives per dollar spent?” Certainly not. For example, I have no idea how to compare these charities to cancer research organizations, which are out of scope. Let me try again:

“GiveWell doesn’t evaluate every charity in the world. Our goal is to save the most lives possible per dollar, not to create a complete ranking or catalogue of charities. Accordingly, our research is oriented around identifying the single charity with the highest demonstrated lives saved per dollar spent - the charity that can prove rigorously that it saved the most” - no, it can’t prove it saved the most lives - “the charity that can prove rigorously that ” - uh -

Do any of our charities prove anything rigorously? Now I’m looking at the page we wrote for our #1 charity and ugh. I mean here are some quotes from our summary on the case for their impact: “All of the reports we've seen are internal reports (i.e., [the charity] - not an external evaluator - conducted them) … Neither [the charity]’s sales figures nor its survey results conclusively demonstrate an impact … It is possible that [the charity] simply uses its subsidized prices to outcompete more expensive sellers of similar materials, and ends up reducing people's costs but not increasing their ownership or utilization of these materials … We cannot have as much confidence in our understanding of [the charity] as in our understanding of [two other charities], whose activities are simpler and more straightforward.”

That’s our #1 charity! We have less confidence in it than our lower-ranked charities … but we ranked it higher anyway because it’s more cost-effective … but it’s not the most cost-effective charity in the world, it’s probably not even the most cost-effective charity we looked at …

Hitting a wall. Well I have no idea what I want to say here.

This image represents me literally playing some video game like Super Meat Boy while failing to articulate what I want to say. I am not actually this bad at Super Meat Boy (certainly not after all the time I’ve spent playing it while failing to articulate a hypothesis), but I thought all the deaths would give a better sense for how the whole situation feels.

Rearticulating the hypothesis and going “too weak.” Okay, screw this. I know what the problem was - I was writing based on wishful thinking. We haven’t found the most cost-effective charity, we haven’t found the most proven charity. Let’s just lay it out, no overselling, just the real situation.

“GiveWell doesn’t evaluate every charity in the world, because we didn’t have time to do that this year. Instead, we made a completely arbitrary choice to focus on ‘saving lives in Africa’; then we emailed 107 organizations that seemed relevant to this goal, of which 59 responded; we did a really quick first-round application process in which we asked them to provide evidence of their impact; we chose 12 finalists, analyzed those further, and were most impressed with Population Services International. There is no reason to think that the best charities are the ones that did best in our process, and significant reasons to think the opposite, that the best charities are not the ones putting lots of time into a cold-emailed application from an unfamiliar funder for $25k. Like every other donor in the world, we ended up making an arbitrary, largely aesthetic judgment that we were impressed with Population Services International. Readers who share our aesthetics may wish to donate similarly, and can also purchase photos of Elie and Holden at the following link:”

OK wow. This is what we’ve been working on for a year? Why would anyone want this? Why are we writing this up? I should keep writing this so it’s just DONE but ugh, the thought of finishing this website is almost as bad as the thought of not finishing it.

Hitting a wall.

What do I do, what do I do, what do I do.

Rearticulating the hypothesis and assigning myself more work. OK. I gave up, went to sleep, thought about other stuff for a while, went on a vision quest, etc. I’ve now realized that we can put it this way: our top charities are the ones with verifiable, demonstrated impact and room for more funding, and we rank them by estimated cost-effectiveness. “Verifiable, demonstrated” is something appealing we can say about our top charities and not about others, even though it’s driven by the fact that they responded to our emails and others didn’t. And then we rank the best charities within that. Great.

So I’m sitting down to write this, but I’m kind of thinking to myself: “Is that really quite true? That ‘the charities that participated in our process and did well’ and ‘The charities with verifiable, demonstrated impact’ are the same set? I mean … it seems like it could be true. For years we looked for charities that had evidence of impact and we couldn’t find any. Now we have 2-3. But wouldn’t it be better if I could verify none of these charities that ignored us have good evidence of impact just sitting around on their website? I mean, we definitely looked at a lot of websites before but we gave up on it, and didn’t scan the eligible charities comprehensively. Let me try it.”

I take the list of charities that didn’t participate in round 1. That’s not all the charities in the world, but if none of them have a good impact section on their website, we’ve got a pretty plausible claim that the best stuff we saw in the application process is the best that is (now) publicly available, for the “eligible” charities in the cause. (This assumes that if one of the applicants had good stuff sitting around on their website, they would have sent it.)

I start looking at their websites. There are 48 charities, and in the first hour I get through 6, verifying that there’s nothing good on any of those websites. This is looking good: in 8 work hours I’ll be able to defend the claim I’ve decided to make.

Hmm. This water charity has some kind of map of all the wells they’ve built, and some references to academic literature arguing that wells save lives. Does that count? I guess it depends on exactly what the academic literature establishes. Let’s check out some of these papers … huh, a lot of these aren’t papers per se so much as big colorful reports with giant bibliographies. Well, I’ll keep going through these looking for the best evidence I can …

“This will never end.” Did I just spend two weeks reading terrible papers about wells, iron supplementation and community health workers? Ugh and I’ve only gotten through 10 more charities, so I’m only about ⅓ of the way through the list as a whole. I was supposed to be just writing up what we found, I can’t take a 6-week detour!

The over-ambitious deadline. All right, I’ll sprint and get it done in a week. [1 week later] Well, now I’m 60% way through the whole list. !@#$

“This is garbage.” What am I even doing anyway? I’m reading all this literature on wells and unilaterally deciding that it doesn’t count as “proof of impact” the way that Population Services International’s surveys count as “proof of impact.” I’m the zillionth person to read these papers; why are we creating a website out of these amateur judgments? Who will, or SHOULD, care what I think? I’m going to spend another who knows how long writing up this stupid page on what our recommendations do and don’t mean, and then another I don’t even want to think about it finishing up all the other pages we said we’d write, and then we’ll put it online and literally no one will read it. Donors won’t care - they will keep going to charities that have lots of nice pictures. Global health professionals will just be like “Well this is amateur hour.”¹

This is just way out of whack. Every time I try to add enough meat to what we’re doing that it’s worth publishing at all, the timeline expands another 2 months, AND we still aren’t close to having a path to a quality product that will mean something to someone.

What’s going wrong here?

I have a deep sense that I have something to say that is worth arguing for, but I don’t actually know what I am trying to say. I can express it in conversation to Elie, but every time I start writing it down for a broad audience, I realize that Elie and I had a lot of shared premises that won’t be shared by others. Then I need to decide between arguing the premises (often a huge amount of extra work), weakening my case (often leads to a depressing sense that I haven’t done anything worthwhile), or somehow reframing the exercise (the right answer more often than one would think).
It often feels like I know what I need to say and now the work is just “writing it down.” But “writing it down” often reveals a lot of missing steps and thus explodes into more tasks - and/or involves long periods of playing Super Meat Boy while I try to figure out whether there’s some version of what I was trying to say that wouldn’t have this property.
I’m approaching a well-established literature with an idiosyncratic angle, giving me constant impostor syndrome. On any given narrow point, there are a hundred people who each have a hundred times as much knowledge as I do; it’s easy to lose sight of the fact that despite this, I have some sort of value-added to offer (I just need to not overplay what this is, and often I don’t have a really crisp sense of what it is).
Because of the idiosyncratic angle, I lack a helpful ecosystem of peer reviewers, mentors, etc.
- There’s nothing to stop me from sinking weeks into some impossible and ill-conceived version of my project that I could’ve avoided just by, like, rephrasing one of my sentences. (The above GiveWell example has me trying to do extra work to establish a bunch of points that I ultimately just needed to sidestep, as you can see from the final product. This definitely isn’t always the answer, but it can happen.)
- I’m simultaneously trying to pose my question and answer it. This creates a dizzying feeling of constantly creating work for myself that was actually useless, or skipping work that I needed to do, and never knowing which I’m doing because I can’t even tell you who’s going to be reading this and what they’re going to be looking for.
- There aren’t any well-recognized standards I can make sure I’m meeting, and the scope of the question I’m trying to answer is so large that I generally have a creeping sense that I’m producing something way too shot through with guesswork and subjective judgment to cause anyone to actually change their mind.

All of these things are true, and they’re all part of the picture. But nothing really changes the fact that I’m on my way to having (and publishing) an unusually thoughtful take on an important question. If I can keep my eye on that prize, avoid steps that don’t help with it (though not to an extreme, i.e., it’s good for me to have basic contextual knowledge), and keep reframing my arguments until I capture (without overstating) what’s new about what I’m doing, I will create something valuable, both for my own learning and potentially for others’.

“Valuable” doesn’t at all mean “final.” We’re trying to push the conversation forward a step, not end it. One of the fun things about the GiveWell example is that the final product that came out at the end of that process was actually pretty bad! It had essentially nothing in common with the version of GiveWell that first started feeling satisfying to donors and moving serious money, a few years later. (No overlap in top charities, very little overlap in methodology.)

For me, a huge part of the challenge of working on this kind of problem is just continuing to come back to that. As I bounce between “too weak” hypotheses and “too strong” ones, I need to keep re-aiming at something I can argue that’s worth arguing, and remember that getting there is just one step in my and others’ learning process. A future piece will go through some accumulated tips on pulling that off.

Footnotes

I really enjoyed the “What qualifies you to do this work?” FAQ on the old GiveWell site that I ran into while writing this. ↩

6 comments

Comments sorted by top scores.

comment by Dave Orr (dave-orr) · 2022-03-02T18:18:53.782Z · LW(p) · GW(p)

This post seems like a nice illustration of Paul Graham's latest essay about how you don't understand something until you've written about it.

Writing about something, even something you know well, usually shows you that you didn't know it as well as you thought. Putting ideas into words is a severe test. The first words you choose are usually wrong; you have to rewrite sentences over and over to get them exactly right. And your ideas won't just be imprecise, but incomplete too. Half the ideas that end up in an essay will be ones you thought of while you were writing it.

comment by DirectedEvolution (AllAmericanBreakfast) · 2024-01-12T08:31:36.684Z · LW(p) · GW(p)

This post and its companion [LW · GW] have even more resonance now that I'm deeper into my graduate education and conducting my research more independently.

Here, the key insight is that research is an iterative process of re-scoping the project and execution on the current version of the plan. You are trying to make a product sufficient to move the conversation forward, not (typically) write the final word on the subject.

What you know, what resources you have access to, your awareness of what people care about, and what there's demand for, depend on your output. That's all key for the next project. A rule of thumb is that at the beginning, you can think of your definition of done as delivering a set of valuable conclusions such that it would take about 10 hours for any reasonably smart person to find a substantial flaw.

You should keep on rethinking whether the work you're doing (read: the costs you're paying) are delivering as much value, given your current state of knowledge. As you work on the project, and have conversations with colleagues, advisors and users, your understanding of where the value's at and how large the costs of various directions are, will constantly update. So you will need to update your focus along with it. Accept the interruptions as a natural, if uncomfortable, part of the process.

Remember that one way or another, you're going to get your product to a point where it has real, unique value to other people. You just need to figure out what that is and stay the course.

The advice here also helps me figure out how to interact with my fellow students when they're proposing excessively costly projects with no clear benefit due to their passion for and interest in the work itself and their love of rigor and design. Instead of quashing their passion or staying silent or being encouraging despite my misgivings, I can say something like "I think this could be valuable in the future once it's the main bottleneck to value, but I think [some easier, more immediately beneficial task] is the way to go for now. You can always do the thing you're proposing at a later time." This helps me be more honest while, I believe, helping them steer their efforts in ways that will bring them greater rewards.

The most actionable advice I got from the companion piece was the idea of making an outline of the types of evidence you'll use to argue for your claims, and get a sign-off from a colleague or advisor on the adequacy of that evidence before you go about gathering it. Update that outline as you go along. I've been struggling with this exact issue and it seems like a great solution to the problem. I'm eager to try it with my PhD advisors.

Edit: as a final note, I think we are very fortunate to have Holden, a co-founder of a major philanthropic organization, describing what his process was like during its formation. Exposition on what he's tracking in his head [LW · GW] is underprovided generally and Holden really went above and beyond on this one.

comment by Ruby · 2022-03-16T02:41:04.233Z · LW(p) · GW(p)

Curated. I think much of the work that gets done in the world is either the result of the Streetlight Effect / question substitution, and it's great to see somewhat writing about how often the real problems aren't that, and pointing out what the experience is when tackling them. I look forward to subsequent pieces that talk about how to better navigate these Wicked Problems.

comment by Mbp · 2022-03-16T06:33:42.916Z · LW(p) · GW(p)

My head started spinning. I probably would have included information on the audience you were impressing. The conundrum (wicked problem) is that what you wanted to write (truth) is not what donors want to read. Donors sometimes skim through mission statements and it’s now political and who you are connected with. Brands donate as a form of marketing to promote sales. Who is the biggest trophy for our BD and stakeholders. The orgs who need it the most are putting up turbines in Africa and lack the resources to dedicate one grant writing professional on staff. A true philanthropist would already know this.

comment by Pattern · 2022-03-17T18:19:47.929Z · LW(p) · GW(p)

Hmm. This water charity has some kind of map of all the wells they’ve built, and some references to academic literature arguing that wells save lives. Does that count?

Ditch their justification/model. How do wells save lives?

If people fall in and die, that's counter productive. (That's not a reason for 'lives are saved'.)
If there isn't good water there, but well water is clean, then it's plausible.
Do wells offer other benefits? (A source of clean water that's reliable, near a hospital***, that can still be used in the event of a disaster**...) If it's closer so people don't have to share water (containers), so there's less spread...

Try and figure out if they had clean water before. (Note that adding more clean sources does seem good - it might protects against loss of other sources, or allow more people moving in*.)

*And if they're moving from somewhere with worse water, then that seems like an improvement.

**Do disasters mess with access to wells or their water quality? (What disasters do they have in this area?)

***Clean water for cleaning stuff might be important generally. (And if there's not a lot of clean water, maybe it doesn't get used for that.)

comment by rain8dome9 · 2022-03-16T17:16:28.697Z · LW(p) · GW(p)

The internet is filled with BS. There are a million health tracking devices. The most reliable of these are either FDA certified medical devices and therefore the company that makes them will be punished for misrepresentation, or Open Source and therefor extremely transparent. Might similar rules apply to charities?

The Wicked Problem Experience

Contents

Flashback to 2007 GiveWell

What’s going wrong here?

Footnotes

6 comments