AI Safety Impact Markets: Your Charity Evaluator for AI Safety

post by Dawn Drescher (Telofy) · 2023-10-01T10:47:06.952Z · LW · GW · 5 comments

This is a link post for https://impactmarkets.substack.com/p/ai-safety-impact-markets

5 comments

Comments sorted by top scores.

comment by Joe_Collman · 2023-10-02T02:13:07.645Z · LW(p) · GW(p)

Seems interesting and important.
I think it's great that you're working on this!
My thoughts are mostly critical - since I think you're trying something extremely difficult.
But again, it's great that you're trying!

From what I can see the usefulness of this system will all hinge on the evaluation system.

To be clear, I don't mean "it's important that there's a well-thought-through impact evaluation mechanism" (though it is). Rather "it's far from clear we have the collective understanding to produce a fit-for-purpose impact evaluation system".

The first thing I'd look for in a new funding mechanism is: "Is this likely to provide support for important research directions that current mechanisms don't favour?"
I.e. is it usefully shaping the incentive landscape?

My expected answer for something like this is: "no" (without an evaluation system designed with this in mind)
Most current systems already incentivize the pursuit of incremental, reliably artifact-producing, legibly-plausibly-useful research. It's not clear that building more roads to the same destinations helps. (please let me know if it seems that I'm misunderstanding the system)

In particular, [x did this nice thing we can easily measure] may often have the externality [x didn't try this much more important higher variance thing], or [x didn't do this nice thing it's hard to measure].

Then there's an [amplification of the field's existing direction] issue - fine if the field is broadly doing the right thing already; otherwise not. (some mechanic to reward [those who've backed projects that cause other donors to update on seeing the results] might be worth a thought)

 

I don't currently have great thoughts on ways to do better - though it's something I hope to find time to think through more carefully.

However, if there's one key consideration, I think it's that:
[support projects for which there's the best evidence of high (future) impact] and [incentivize researchers to pursue the projects with highest expected impact] do not imply the same approach. ([optimize for (early, reliable) evidence of impact] != [optimize for impact])

If we actually want the latter, we shouldn't be creating multiple systems to do the former.

Replies from: Telofy, lunatic_at_large
comment by Dawn Drescher (Telofy) · 2023-10-02T14:02:36.261Z · LW(p) · GW(p)

Awww, thanks for the input!

I actually have two responses to this, one from the perspective of the current situation – our system in phase 1, very few donors, very little money going around, most donors don't know where to donate – and the final ecosystem that we want to see if phase 3 comes to fruition one day – lots of pretty reliable governmental and CSR funding, highly involved for-profit investors, etc.


The second is more interesting but also more speculative. The diagram here [LW · GW], shows both the verifier/auditor/evaluator and the standardization firms. I see the main responsibility with the standardization firms, and that's also where I would like my company to position itself if we reach that stage (possibly including the verification part). 

One precedent for that is the Impact Genome. It currently recognizes (by my latest count) 176 kinds of outcomes. They are pretty focused on things that I would class as deploying solutions in global development, but they're already branching out into other fields as well. Extend that database with outcomes like different magnitudes of career plan changes (cf. 80,000 Hours), years of dietary change, new and valuable connections between collaborators, etc., and you'll probably end up with a database of several hundred outcome measures, most of which are not just about publishing in journals. (In the same section I mention some other desiderata that diverge a bit from how the Impact Genome is currently used. That article is generally the more comprehensive and interesting one, but for some reason it got fewer upvotes.)

In this world there's also enough financial incentive for project developers to decide what they want to do based on what is getting funded, so it's important to set sensible incentives.

It's possible that even in this world there'll be highly impactful and important things to do that'll somehow slip through the cracks. Absent cultural norms around how to attribute the effects of some more obscure kind of action, it might lead to too many court battles to even attempt to monetize it. I'm thinking of tricky cases that are all about leveraging the actions of others, e.g., when doing vegan outreach work. Currently there are no standards for how to attribute such work (how much reward should the leaflet designer get, how much should the activist get, how much should the new vegan or reducitarian get). But over time more and more of those will probably get solved as people agree on arbitrary assignments. (Court battles cost a lot of money, and new vegans will not want to financially harm the people who first convinced them to go vegan, so the activist and the leaflet designer are probably in good positions to monetize their contributions, and just have to talk to each other how to split the spoils.)


But we're so so far away from that world. 

In the current world I see three reasons for our current approach:

  1. It's basically on par with how evaluations are done already while making them more scalable.
  2. The counterfactual to getting funded through a system like ours is usually dropping out of AI safety work, not doing something better within AI safety.
  3. If we're successful with our system, project developers will much sooner do small, cheap tweaks to make their projects more legible, not change them fundamentally.

First, my rough impression from the projects on our platform that I know better is that for them it's mostly that they're, by default, not getting any funding or just some barely sufficient baseline funding from their loyal donors. With Impact Markets, they might get a bit of money on top. The loyal donors are probably usually individuals with personal ties to the founders. The funding that they can get on top is thanks to their published YouTube videos, blog articles, conference talks, etc. So one funding source is thanks to friendships; the other is thanks to legible performance. But there's no funding from some large donor who is systematically smarter and more well-connected than our evaluators + project scout network. 

And even really smart funders like Open Phil will look at legible things like the track record of a project developer when making their grant recommendations. If the project developer has an excellent track record of mentioning just the right people and topics to others at conferences, then no one, not Open Phil or even the person themselves will be able to take that into account because of how illegible it is.

Second, we're probably embedded in different circles (I'm guessing you're more thinking of academic researchers at university departments where they can do AI safety research?), but in my AI safety circles there are the people who have savings from their previous jobs that they're burning through, maybe some with small LTFF grants, and some that support each other financially or with housing. So by and large it's either they get a bit of extra money through Impact Markets and can continue their work for another quarter or they drop out of AI safety work and go back to their industry jobs. So even if we had enough funding for them, it would just prevent them from going back to unrelated work for a bit longer, not change what they're doing within AI safety.

A bit surprisingly, maybe, one of our biggest donors on the platform is explicitly using it to look for projects that push for a pause or moratorium on AGI development [LW · GW], largely though public outreach. That can be checked by evaluators through newspaper reports on the protests, and other photos and videos, but it'll be unusually opaque how many people they reached, whether any of them were relevant, and what they took away from it. So far our track record seems to be to foster rather illegible activism rather than distract from it, though admittedly that has happened a bit randomly – Greg is just really interested in innovative funding methods.

Third, currently the incentives are barely enough to convince project developers to spend 5 minutes to post their existing proposals to our platform, and only in some cases. (In others I've posted the projects for them and then reassigned them to their accounts.) They are not enough to cause project developers to make sure that they have the participants' permission to publish (or share with external evaluators) the recordings of their talks. They're not enough for them to design feedback surveys that shed light on how useful an event was to the participants. (Unless they already have them for other reasons.)

And it makes some sense too: We've tracked $391,000 in potential donations from people who want to use our platform; maybe 10% of those will follow through; divide that by the number of projects (50ish), and the average project can hope for < $1,000. (Our top projects can perhaps hope for $10k+ while the tail projects can probably not expect to fundraise anything, but the probability distribution math is too complicated for me right now. Some project developers might expect a Pareto distribution where they'd have to get among the top 3 or so for it to matter at all; others might expect more of a log-normal distribution.) Maybe they're even more pessimistic than I am in their assumptions, so I can see that any change that would require a few hours of work does not seem worth it to them at the moment.

If we become a bit more successful in building momentum behind our platform, maybe we can attract 100+ donors with > $1 million in total funding, so that we can present a stronger incentive for project developers. But even then I think what would happen is rather that they'll do such things as design feedback surveys to share with evaluators or record unconference talks to share them etc., but not to fundamentally change what they're doing to make it more provable.

So I think if we scale up by 3–4 orders of magnitude, we'll probably still do a bit better with our system than existing funders (in terms of scaling down, while having similarly good evaluations), but then we'll need to be careful to get various edge cases right. Though even then I don't think mistakes will be path dependent. If there is too little funding for some kind of valuable work, and the standardization firms find out about it, they can design new standards for those niches.

I hope that makes sense to you (and also lunatic_at_large), but please let me know if you disagree with any of the assumptions and conclusion. I see for example that even now, post-FTX, people are still talking about a talent constraint (rather than funding constraint) in AI safety, which I don't see at all. But maybe the situation is different in the US, and we should rebrand to impactmarkets.eu or something! xD

Replies from: Joe_Collman
comment by Joe_Collman · 2023-10-02T21:29:05.210Z · LW(p) · GW(p)

Thanks for the lengthy response.
Pre-emptive apologies for my too-lengthy response; I tried to condense it a little, but gave up!

Some thoughts:

First, since it may help suggest where I'm coming from:

we're probably embedded in different circles

Certainly to some extent, but much less than you're imagining - I'm an initially self-funded AIS researcher who got a couple of LTFF research grants and has since been working with MATS. Most of those coming through MATS have short runways and uncertain future support for their research (quite a few are on PhD programs, but rarely AIS PhDs).

Second, I get the impression that you think I'm saying [please don't do this] rather than [please do this well]. My main point throughout is that the lack of reliable feedback makes things fundamentally different, and that we shouldn't expect [great mechanism in a context with good feedback] to look the same as [great mechanism in a context without good feedback].

To be clear, when I say "lack of reliable feedback", I mean relative to what would be necessary - not relative to the best anyone can currently do. Paul Christiano's carefully analyzing each project proposal (or outcome) for two weeks wouldn't be "reliable feedback" in the sense I mean.

I should clarify that I'm talking only about technical AIS research when it comes to inadequacy of feedback. For e.g. projects to increase the chance of an AI pause/moratorium, I'm much less pessimistic: I'd characterize these as [very messy, but within a context that's fairly well understood]. I'd expect the right market mechanisms to do decently well at creating incentives here, and for our evaluations to be reasonably calibrated in their inaccuracy (or at least to correct in that direction over time).

Third, my concerns become much more significant as things scale - but such scenarios are where you'll get almost all of your expected impact (whether positive or negative). As long as things stay small, you're only risking missing the opportunity to do better, rather than e.g. substantially reducing the odds that superior systems are produced.
I'd be surprised if this kind of system is not net positive while small.
In much of the below, I'm assuming that things have scaled up quite a bit (or it's likely no big deal).

 

On the longer-term, speculative aims:
My worry isn't around obscure actions/outcomes that slip through the cracks. It's around what I consider a central and necessary action: doing novel, non-obvious research, that gets us meaningfully closer to an alignment solution.

My claim is that we have no reliable way to identify or measure this. Further, the impact of most other plausibly useful outcomes hinges on whether they help with this - i.e. whether they increase the chance that [thing we can't reliably measure progress towards] is produced.
So, for example, a program that successfully onboards and accelerates many new AIS researchers in

I think it's more reasonable in an AI safety context to talk about outcomes than about impact: we can measure many different types of outcome. The honest answer to whether something has had net positive impact is likely to remain "we have no idea" for a long time.

With good feedback it's very reasonable to think that [well constructed market mechanism] will do a good job at solving some problem. Without good feedback, there's no reason to expect this to work. There's a reason to think it'll appear to be working - but that's because we'll be measuring our guesses against our guesses, and finding that we tend to agree with ourselves.

The retrospective element helps a little with this - but we're still comparing [our guess with very little understanding] against [our guess with very little understanding and a little more information].
There are areas where it helps a lot - but those are the easy-to-make-legible areas I don't worry about (much!).

It seems very important to consider how such a system might update and self-correct.

 

On the more immediate stuff:

It's basically on par with how evaluations are done already while making them more scalable.

This is a good reason only if you think [scaled existing approach] is a good approach. Evaluations that are [the best we can do according to our current understanding] should not be confused with evaluations that are fairly accurate in an objective sense.
The best evaluations for technical AIS work currently suck. I think it's important to design a system with that in mind (and to progressively aim for them to suck less, of course)

The counterfactual to getting funded through a system like ours is usually dropping out of AI safety work, not doing something better within AI safety.

I think what's important here is [system we're considering] vs [counterfactual system (with similar cost etc)]. So the question isn't whether someone getting funded through this system would drop out if there were no system - rather it's whether there's some other system that's likely to get more people doing something more valuable within AI safety.

If we're successful with our system, project developers will much sooner do small, cheap tweaks to make their projects more legible, not change them fundamentally.

First, I don't think it's reasonable to assume that there exist "small, cheap tweaks" to make the most important neglected projects legible. The projects I'd consider most important are hard to make legibly useful - this is tied in a fundamental way to their importance: they're (attempting) a non-obvious leap beyond what's currently understood.

Second, the best systems will change the incentive landscape so that the kind of projects pursued are fundamentally changed. Unless we think that all the most important directions are already being pursued, it's hugely important to improve the process by which ideas get created and selected.

 

Another point that confuses me:

If there is too little funding for some kind of valuable work, and the standardization firms find out about it, they can design new standards for those niches.

I expect there is hugely valuable work no-one is doing, and we don't know what it is (and it's unlikely some meta-project changes this picture much). In such a context, we need systems that make it more likely such work happens even without any ability to identify it upfront, or quickly notice its importance once it's completed.

I'm not hugely worried about there being inadequate funding for concrete things that are noticeably neglected.

 

I see for example that even now, post-FTX, people are still talking about a talent constraint (rather than funding constraint) in AI safety, which I don't see at all.

I think this depends a lot on one's model of AI safety progress.

For instance, if we expect that we'll make good progress by taking a load of fairly smart ML people and throwing them at plausibly helpful AIS projects, we seem funding constrained.

If most progress depends on researchers on Paul/Eliezer's level, then we seem talent constrained. (here I don't mean researchers capable of iterating on directions Paul/Eliezer have discovered, but rather those who're capable of coming up with promising new ones themselves)

Of course it's not so simple - both since things aren't one-dimensional, and because there's a question as to how effectively funding can develop the kind of talent that's necessary (this seems hard to me).

I also think it depends on one's high-level view of our current situation - in particular of the things we don't yet understand, and therefore cannot concretely see. I think there's a natural tendency to focus on the things we can see - and there there's a lot of opportunity to do incremental work (funding constraint!).
However, if we instead believe that there are necessary elements of a solution that require identification and deconfusion before we can even find the right language to formulate questions, it's quite possible that everything we can see is insufficient (talent constraint!).

Generally, I expect that we're both funding and talent constrained.
I expect that a significant improvement to the funding side of things could be very important.

Replies from: Telofy
comment by Dawn Drescher (Telofy) · 2023-10-04T15:15:06.524Z · LW(p) · GW(p)

Oh, haha! I'll try to be more concise!

Possible crux: I think I put a stronger emphasis on attribution of impact in my previous comment than you do because to me that seems like both a bit of a problem and solveable in most cases. When it comes to impact measurement, I'm actually (I think) much more pessimistic than you seem to be. There's a risk that EV is just completely undefined even in principle [? · GW] and even if that should turn out to be false or we can use something like stochastic dominance instead to make decisions, that still leaves us with a near-impossible probabilistic modeling task.

If the second is the case, then we can probably improve the situation a bit with projects like the Squiggle ecosystem and prediction markets but it'll take time (which we may not have) and will be a small improvement. (An approximate comparison might be that I think that we can still do somewhat better than GiveWell, especially by not bottoming out at bad proxies like DALYs or handling uncertainty more rigorously with Squiggle, and that we can go as well as that in more areas. But not much more, probably.)

Conversely, even if we have roughly the same idea how much the passing of time helps in forecasting things, I'm more optimistic about it, relatively speaking.

Might that be a possible crux? Otherwise I feel like we agree on most things, like desiderata, current bottlenecks, and such.

It seems very important to consider how such a system might update and self-correct.

Argh, yeah. We're following the example of carbon credits in many respects, and there there are some completely unnecessary issues whose impact market equivalents we need to prevent. It's too early to think about this now, but when the time comes, we should definitely talk to insiders of the space who have ideas in how it should be changed (but probably can't anymore) to prevent the bad incentives that have probably caused that.

Another theme in our conversation, I think, is figuring out exactly what or how much the final system should do. Of course there are tons of important problems that need to be solved urgently, but if one system tries to solve all of them, they sometimes trade off against each other. Especially for small startups it can be better to focus on one problem and solve it well rather than solve a whole host of problem a little bit each.

I think at Impact Markets we have this intuition that experienced AI safety researchers are smarter than most other people when it comes to prioritizing AI safety work, so that we shouldn't try to steer incentives in some direction or other and instead double down on getting them funded. That gets harder once we have problems with fraud and whatnot, but when it comes to our core values, I think we are closer to, “We think you're probably doing a good job and we want to help you,” rather than “You're a bunch of raw talent that wants to be herded and molded.” Such things as banning scammers is then an unfortunate deviation from our core mission that we have to accept. That could change – but that's my current feeling on our positioning.

In such a context, we need systems that make it more likely such work happens even without any ability to identify it upfront, or quickly notice its importance once it's completed.

Nothing revolutionary, but this could become a bit easier. When Michael Aird started posting on the EA Forum, I and others probably figured, “Huh, why didn't I think of doing that?” And then, “Wow, this fellow is great at identifying important, neglected work they can just do!” With a liquid impact market, Michael's work would receive its first investments at this stage, which would create additional credible visibility on the marketplaces, which could cascade into more and more investments. We're replicating that system with our score at the moment. Michael could build legible track record more quickly through the reputational injections from others, and then he could use that to fundraise for stuff that no one understands, yet.

I expect that a significant improvement to the funding side of things could be very important.

Yeah, also how to even test what the talent constraint is when the funding constraint screens it off. When the funding was flowing better (because part of it was stolen from FTX customers…), has AI safety progress sped up? Do you or others have intuitions on that?

comment by lunatic_at_large · 2023-10-02T03:19:16.440Z · LW(p) · GW(p)

I also suspect that the evaluation mechanism is going to be very important. I can think of philosophical debates whose resolution could change the impact of an "artifact" by many orders of magnitude. If possible I think it could be good to have several different metrics (corresponding to different objective functions) by which to grade these artifacts. That way you can give donors different scores depending on which metrics you want to look at. For example, you might want different scores for x-risk minimization, s-risk minimization, etc. That still leaves the "[optimize for (early, reliable) evidence of impact] != [optimize for impact]" issue open, of course.