Posts

The GiveWiki’s Top Picks in AI Safety for the Giving Season of 2023 2023-12-07T09:23:05.018Z
The Top AI Safety Bets for 2023: GiveWiki’s Latest Recommendations 2023-11-11T09:04:27.585Z
Regrant up to $600,000 to AI safety projects with GiveWiki 2023-10-28T19:56:06.676Z
AI Safety Impact Markets: Your Charity Evaluator for AI Safety 2023-10-01T10:47:06.952Z
The Retroactive Funding Landscape: Innovations for Donors and Grantmakers 2023-09-29T17:39:23.874Z
Play Regrantor: Move up to $250,000 to Your Top High-Impact Projects! 2023-05-17T16:51:15.075Z
A Fresh FAQ on GiveWiki and Impact Markets Generally 2023-04-06T14:02:51.419Z
How might cryptocurrencies affect AGI timelines? 2021-02-28T19:16:15.326Z
Self-Similarity Experiment 2020-08-15T13:19:56.916Z

Comments

Comment by Dawn Drescher (Telofy) on The GiveWiki’s Top Picks in AI Safety for the Giving Season of 2023 · 2023-12-08T13:22:36.645Z · LW · GW

Does this include all donors in the calculation or are there hidden donors?

Donors have a switch in their profiles where they can determine whether they want to be listed or not. The top three in the private, complete listing are Jaan Tallinn, Open Phil, and the late Future Fund, whose public grants I've imported. The total ranking lists 92 users. 

But I don't think that's core to understanding the step down. I've gone through the projects around the threshold before I posted my last comment, and I think it's really the 90% cutoff that causes it. Not a big donor who has donated to the first 22 but not to the rest.

There are plenty of projects in the tail that have also received donations from a single donor with a high score – but more or less only that so that said donor has > 90% influence over the project and will be ignored until more donors register donations to it.

Ok so the support score is influenced non-linearly by donor score.

By the inverse rank in the ranking that is sorted by the score. So the difference between the top top donor and the 2nd top donor is 1 in terms of the influence they have.

Comment by Dawn Drescher (Telofy) on The GiveWiki’s Top Picks in AI Safety for the Giving Season of 2023 · 2023-12-08T13:12:44.610Z · LW · GW

It displays well for me!

Comment by Dawn Drescher (Telofy) on The GiveWiki’s Top Picks in AI Safety for the Giving Season of 2023 · 2023-12-08T12:25:59.041Z · LW · GW

TL;DR: Great question! I think it mostly means that we don't have enough data to say much about these projects. So donors who've made early donations to them, can register them and boost their project score.

  1. The donor score relies on the size of the donations and their earliness in the history of the project (plus the retroactive evaluation). So the top donors in particular have made many early, big, and sometimes public grants to projects that panned out well – hence why they are top donors.
  2. What influences the support score is not the donor score itself but the inverse rank of the donor in the ranking that is ordered by the donor score. (This corrects the outsized influence that rich donors would otherwise have, since I assume that wealth is Pareto distributed but expertise is maybe not, and is probably not correlated at quite that extreme level.)
  3. But if a single donor has more than 90% influence on the score of a project, they are ignored, because that typically means that we don't have enough data to score the project. We don't want a single donor to wield so much power.

Taken together, our top donors have (by design) the greatest influence over project scores, but they are also at a greater risk of ending up with > 90% influence over the project score, especially if the project has so far not found many other donors who've been ready to register their donations. So the contributions of top donors are also at greater risk of being ignored until more donors confirm the top donors' donation decisions.

Comment by Dawn Drescher (Telofy) on The GiveWiki’s Top Picks in AI Safety for the Giving Season of 2023 · 2023-12-08T09:21:36.090Z · LW · GW

"GiveWiki" as the authority for the picker, to me, implied that this was from a broader universe of giving, and this was the AI Safety subset.

Could be… That's not so wrong either. We rather artificially limited it to AI safety for the moment to have a smaller, more sharply defined target audience. It also had the advantage that we could recruit our evaluators from our own networks. But ideally I'd like to find owners for other cause areas too and then widen the focus of GiveWiki accordingly. The other cause area where I have a relevant network is animal rights, but we already have ACE there, so GiveWiki wouldn't add so much on the margin. One person is interested in potentially either finding someone or themselves taken responsibility for an global coordination/peace-building branch, but they probably won't have the time. That would be excellent though!

No biggie, but I'm sad there isn't more discussion about donations to AI safety research vs more prosaic suffering-reduction in the short term.

Indeed! Rethink Priorities has made some progress on that. I need to dig into the specifics more to see whether I need to update on it. The particular parameters that they discuss in the article have not been so relevant to my reasoning on these parameters, but it's well possible that animal rights wins out even more clearly on the basis of the parameters that I've been using.

Comment by Dawn Drescher (Telofy) on The GiveWiki’s Top Picks in AI Safety for the Giving Season of 2023 · 2023-12-07T19:05:22.500Z · LW · GW

It says “AI Safety” later in the title. Do you think I should mention it earlier, like “The AI Safety GiveWiki's Top Picks for the Giving Season of 2023”?

Comment by Dawn Drescher (Telofy) on Davidad's Bold Plan for Alignment: An In-Depth Explanation · 2023-11-23T12:59:46.947Z · LW · GW

Thanks so much for the summary! I'm wondering how this system could be bootstrapped in the industry using less powerful but current-levels-of-general AIs. Building a proof of concept using a Super Mario world is one thing, but what I would find more interesting is a version of the system that can make probabilistic safety guarantees for something like AutoGPT so that it is immediately useful and thus more likely to catch on. 

What I'm thinking of here seems to me a lot like ARC Evals with probably somewhat different processes. Humans doing tasks that should, in the end, be automated. But that's just how I currently imagine it after a few minutes of thinking about it. Would something like that be so far from OAA to be uninformative toward the goal of testing, refining, and bootstrapping the system?

Unrelated: Developing a new language for the purpose of the world modeling would introduce a lot of potential for bugs and there'd be no ecosystem of libraries. If the language is a big improvement over other functional languages, has good marketing, and is widely used in the industry, then that could change over the course of ~5 years – the bugs would largely get found and an ecosystem might develop – but that seems very hard, slow, risky, and expensive to pull off. Maybe Haskell could do the trick too? I've done some correctness proofs of simple Haskell programs at the university, and it was quite enjoyable.

Comment by Dawn Drescher (Telofy) on The Top AI Safety Bets for 2023: GiveWiki’s Latest Recommendations · 2023-11-13T11:37:33.922Z · LW · GW

Hiii! You can toggle the “Show all” switch on the projects list to see all publicly listed projects. We try to only rank, and thereby effectively recommend, projects that are currently fundraising, so projects that have any sort of donation page or widget that they direct potential donors to. In some cases this is just a page that says “If you would like to support us with a donation, please get in touch.” When the project owner adds a link to such a page in the “payment URL” field, the project switches from “Not currently accepting donations” to “Accepting donations” and is visible by default. In the case of Lightcone, we couldn't find any such page.

The Lightcone project is currently still owned by us, which is a stopgap. I see that you already have an account on platform. Can I assign the project to you so you can add or remove the donation link as you see fit? Thanks!

Comment by Dawn Drescher (Telofy) on AI Safety Impact Markets: Your Charity Evaluator for AI Safety · 2023-10-04T15:15:06.524Z · LW · GW

Oh, haha! I'll try to be more concise!

Possible crux: I think I put a stronger emphasis on attribution of impact in my previous comment than you do because to me that seems like both a bit of a problem and solveable in most cases. When it comes to impact measurement, I'm actually (I think) much more pessimistic than you seem to be. There's a risk that EV is just completely undefined even in principle and even if that should turn out to be false or we can use something like stochastic dominance instead to make decisions, that still leaves us with a near-impossible probabilistic modeling task.

If the second is the case, then we can probably improve the situation a bit with projects like the Squiggle ecosystem and prediction markets but it'll take time (which we may not have) and will be a small improvement. (An approximate comparison might be that I think that we can still do somewhat better than GiveWell, especially by not bottoming out at bad proxies like DALYs or handling uncertainty more rigorously with Squiggle, and that we can go as well as that in more areas. But not much more, probably.)

Conversely, even if we have roughly the same idea how much the passing of time helps in forecasting things, I'm more optimistic about it, relatively speaking.

Might that be a possible crux? Otherwise I feel like we agree on most things, like desiderata, current bottlenecks, and such.

It seems very important to consider how such a system might update and self-correct.

Argh, yeah. We're following the example of carbon credits in many respects, and there there are some completely unnecessary issues whose impact market equivalents we need to prevent. It's too early to think about this now, but when the time comes, we should definitely talk to insiders of the space who have ideas in how it should be changed (but probably can't anymore) to prevent the bad incentives that have probably caused that.

Another theme in our conversation, I think, is figuring out exactly what or how much the final system should do. Of course there are tons of important problems that need to be solved urgently, but if one system tries to solve all of them, they sometimes trade off against each other. Especially for small startups it can be better to focus on one problem and solve it well rather than solve a whole host of problem a little bit each.

I think at Impact Markets we have this intuition that experienced AI safety researchers are smarter than most other people when it comes to prioritizing AI safety work, so that we shouldn't try to steer incentives in some direction or other and instead double down on getting them funded. That gets harder once we have problems with fraud and whatnot, but when it comes to our core values, I think we are closer to, “We think you're probably doing a good job and we want to help you,” rather than “You're a bunch of raw talent that wants to be herded and molded.” Such things as banning scammers is then an unfortunate deviation from our core mission that we have to accept. That could change – but that's my current feeling on our positioning.

In such a context, we need systems that make it more likely such work happens even without any ability to identify it upfront, or quickly notice its importance once it's completed.

Nothing revolutionary, but this could become a bit easier. When Michael Aird started posting on the EA Forum, I and others probably figured, “Huh, why didn't I think of doing that?” And then, “Wow, this fellow is great at identifying important, neglected work they can just do!” With a liquid impact market, Michael's work would receive its first investments at this stage, which would create additional credible visibility on the marketplaces, which could cascade into more and more investments. We're replicating that system with our score at the moment. Michael could build legible track record more quickly through the reputational injections from others, and then he could use that to fundraise for stuff that no one understands, yet.

I expect that a significant improvement to the funding side of things could be very important.

Yeah, also how to even test what the talent constraint is when the funding constraint screens it off. When the funding was flowing better (because part of it was stolen from FTX customers…), has AI safety progress sped up? Do you or others have intuitions on that?

Comment by Dawn Drescher (Telofy) on AI Safety Impact Markets: Your Charity Evaluator for AI Safety · 2023-10-02T14:02:36.261Z · LW · GW

Awww, thanks for the input!

I actually have two responses to this, one from the perspective of the current situation – our system in phase 1, very few donors, very little money going around, most donors don't know where to donate – and the final ecosystem that we want to see if phase 3 comes to fruition one day – lots of pretty reliable governmental and CSR funding, highly involved for-profit investors, etc.


The second is more interesting but also more speculative. The diagram here, shows both the verifier/auditor/evaluator and the standardization firms. I see the main responsibility with the standardization firms, and that's also where I would like my company to position itself if we reach that stage (possibly including the verification part). 

One precedent for that is the Impact Genome. It currently recognizes (by my latest count) 176 kinds of outcomes. They are pretty focused on things that I would class as deploying solutions in global development, but they're already branching out into other fields as well. Extend that database with outcomes like different magnitudes of career plan changes (cf. 80,000 Hours), years of dietary change, new and valuable connections between collaborators, etc., and you'll probably end up with a database of several hundred outcome measures, most of which are not just about publishing in journals. (In the same section I mention some other desiderata that diverge a bit from how the Impact Genome is currently used. That article is generally the more comprehensive and interesting one, but for some reason it got fewer upvotes.)

In this world there's also enough financial incentive for project developers to decide what they want to do based on what is getting funded, so it's important to set sensible incentives.

It's possible that even in this world there'll be highly impactful and important things to do that'll somehow slip through the cracks. Absent cultural norms around how to attribute the effects of some more obscure kind of action, it might lead to too many court battles to even attempt to monetize it. I'm thinking of tricky cases that are all about leveraging the actions of others, e.g., when doing vegan outreach work. Currently there are no standards for how to attribute such work (how much reward should the leaflet designer get, how much should the activist get, how much should the new vegan or reducitarian get). But over time more and more of those will probably get solved as people agree on arbitrary assignments. (Court battles cost a lot of money, and new vegans will not want to financially harm the people who first convinced them to go vegan, so the activist and the leaflet designer are probably in good positions to monetize their contributions, and just have to talk to each other how to split the spoils.)


But we're so so far away from that world. 

In the current world I see three reasons for our current approach:

  1. It's basically on par with how evaluations are done already while making them more scalable.
  2. The counterfactual to getting funded through a system like ours is usually dropping out of AI safety work, not doing something better within AI safety.
  3. If we're successful with our system, project developers will much sooner do small, cheap tweaks to make their projects more legible, not change them fundamentally.

First, my rough impression from the projects on our platform that I know better is that for them it's mostly that they're, by default, not getting any funding or just some barely sufficient baseline funding from their loyal donors. With Impact Markets, they might get a bit of money on top. The loyal donors are probably usually individuals with personal ties to the founders. The funding that they can get on top is thanks to their published YouTube videos, blog articles, conference talks, etc. So one funding source is thanks to friendships; the other is thanks to legible performance. But there's no funding from some large donor who is systematically smarter and more well-connected than our evaluators + project scout network. 

And even really smart funders like Open Phil will look at legible things like the track record of a project developer when making their grant recommendations. If the project developer has an excellent track record of mentioning just the right people and topics to others at conferences, then no one, not Open Phil or even the person themselves will be able to take that into account because of how illegible it is.

Second, we're probably embedded in different circles (I'm guessing you're more thinking of academic researchers at university departments where they can do AI safety research?), but in my AI safety circles there are the people who have savings from their previous jobs that they're burning through, maybe some with small LTFF grants, and some that support each other financially or with housing. So by and large it's either they get a bit of extra money through Impact Markets and can continue their work for another quarter or they drop out of AI safety work and go back to their industry jobs. So even if we had enough funding for them, it would just prevent them from going back to unrelated work for a bit longer, not change what they're doing within AI safety.

A bit surprisingly, maybe, one of our biggest donors on the platform is explicitly using it to look for projects that push for a pause or moratorium on AGI development, largely though public outreach. That can be checked by evaluators through newspaper reports on the protests, and other photos and videos, but it'll be unusually opaque how many people they reached, whether any of them were relevant, and what they took away from it. So far our track record seems to be to foster rather illegible activism rather than distract from it, though admittedly that has happened a bit randomly – Greg is just really interested in innovative funding methods.

Third, currently the incentives are barely enough to convince project developers to spend 5 minutes to post their existing proposals to our platform, and only in some cases. (In others I've posted the projects for them and then reassigned them to their accounts.) They are not enough to cause project developers to make sure that they have the participants' permission to publish (or share with external evaluators) the recordings of their talks. They're not enough for them to design feedback surveys that shed light on how useful an event was to the participants. (Unless they already have them for other reasons.)

And it makes some sense too: We've tracked $391,000 in potential donations from people who want to use our platform; maybe 10% of those will follow through; divide that by the number of projects (50ish), and the average project can hope for < $1,000. (Our top projects can perhaps hope for $10k+ while the tail projects can probably not expect to fundraise anything, but the probability distribution math is too complicated for me right now. Some project developers might expect a Pareto distribution where they'd have to get among the top 3 or so for it to matter at all; others might expect more of a log-normal distribution.) Maybe they're even more pessimistic than I am in their assumptions, so I can see that any change that would require a few hours of work does not seem worth it to them at the moment.

If we become a bit more successful in building momentum behind our platform, maybe we can attract 100+ donors with > $1 million in total funding, so that we can present a stronger incentive for project developers. But even then I think what would happen is rather that they'll do such things as design feedback surveys to share with evaluators or record unconference talks to share them etc., but not to fundamentally change what they're doing to make it more provable.

So I think if we scale up by 3–4 orders of magnitude, we'll probably still do a bit better with our system than existing funders (in terms of scaling down, while having similarly good evaluations), but then we'll need to be careful to get various edge cases right. Though even then I don't think mistakes will be path dependent. If there is too little funding for some kind of valuable work, and the standardization firms find out about it, they can design new standards for those niches.

I hope that makes sense to you (and also lunatic_at_large), but please let me know if you disagree with any of the assumptions and conclusion. I see for example that even now, post-FTX, people are still talking about a talent constraint (rather than funding constraint) in AI safety, which I don't see at all. But maybe the situation is different in the US, and we should rebrand to impactmarkets.eu or something! xD

Comment by Dawn Drescher (Telofy) on The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts) · 2023-08-31T11:46:00.357Z · LW · GW

It would the producer of the public good (e.g. for my project I put up the collateral).

Oh, got it! Thanks!

Possibly? I'm not sure why you'd do that?

I thought you’d be fundraising to offer refund compensation to others to make their fundraisers more likely to succeed. But if the project developer themself put up the compensation, it’s probably also an important signal or selection effect in the game theoretic setup.

I disagree that a Refund Bonus is a security.

Yeah, courts decide that in the end. Howey Test: money: yes; common enterprise: yes; expectation of profit: sometimes; effort of others: I don’t know, not really? The size of the payout is not security-like, but I don’t know if that matters. All very unclear.

Profit: I imagine people will collect statistics on how close to funded campaigns can still be a day before they close so that they still fail in (say) 90% of the cases. Then they blindly invest $x into all campaigns that are still $x away from that threshold on their last day.

I imagine the courts may find that if someone goes to such efforts to exploit the system, they were probably not tricked into doing so. Plus there is the question of what effort of others we could possibly be referring to.

But even if the courts in the end decide that you’re right and it’s not a security, the legal battle with the SEC alone will be very expensive… They keep expanding their own heuristic for what they think is a security (not up to them to decide). They’ve even started to ignore the “expectation of profit” entirely (with stablecoins).

But perhaps you can find a way to keep the people who run the fundraisers in the clear and keep your company in South Africa (where I know the laws even less though). If the fundraisers are on a mainstream blockchain, the transactions are public, so you (outside of the US) could manage the refund compensation on behalf of the project developers and then pay refunds according to the public records on the blockchain. That way, no one could prove that a particular project developer is a member in your system… except maybe if they make “honeypot” contributions I suppose. Perhaps you can have a separate fund from which you reward contributors to projects you like regardless of whether they’re members. If a honeypot contributor gets a refund, they won’t know whether it’s because the project developer is a member of your org or because you selected their project without them knowing about it.

This is actually a cool idea. I don't know how I'd manage to get people's details for giving refund without co-operating with the fundraising platform, and my impression is that most platforms are hesitant to do things like this. If you know of a platform that would be keen on trying this, please tell me!

Yes… You could talk with Giveth about it. They’re using blockchains, so perhaps you can build on top of them without them having to do anything. But what I’ve done in the past is that people sign up with my platform, get a code, and put the code in their pubic comment that they can attach to a contribution on the third-party platform. Then if they want to claim any rights attached to the contribution from me, I check that the code is the right one, and if it is, believe them that they’re either the person who made the contribution or that the person who made the contribution wanted to gift it to them.

I don't quite understand this point. You could work on AI Safety and donate to animal charities if you don't want to free-ride. 

Well, let’s say I barely have the money to pay for my own cost of living or that I consider a number of AI safety orgs to also be the even-more-cost-effective uses of my money.

Comment by Dawn Drescher (Telofy) on The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts) · 2023-08-30T07:42:13.034Z · LW · GW

Wonderful that you’re working on this! I’m with AI Safety Impact Markets, and I suspect that we will need a system like this eventually. We haven’t received a lot of feedback to the effect yet, so I haven’t prioritized it, but there are at least two applications for it (for investors and (one day, speculatively) for impact buyers/retrofunders). We’re currently addressing it with a bonding curve auction of sorts, which incentivizes donors to come in early, so that they’re also not so incentivized to wait each other out. The incentive structures are different, though, so maybe a combination would be nice.

That last link links to an article that links to GiveWell’s donor’s dilemma. Some people outside EA who I’ve talked to were surprised by it. They had found that donors flock to uncontroversially good donation opportunities and don’t try to save their donations for niche projects. I’m thinking that might be because they donate to look good to others, which works better if they actually look good to others and not weird. So maybe DACs are mostly needed for coordination among utilitarians or some range of consequentialists? Then again these are different problems – freeriding because you want the benefits of the public good for free vs. freeriding so that you can use your funds for a niche project. Both defections in assurance games though.

Sorry if I missed it in the article, but who are the parties you would like to get on board to put up the refund bonuses? Can you bootstrap them by using DACs to fundraise for bigger and bigger refund bonuses?

Are you aware of any issues with securities law, since people can make monetary profits off refund bonuses? I might be able to get you in touch with someone who knows these things when it comes to the US. I imagine (no idea really) that the law of the country of the person who starts a fundraiser will be what matters legally.

Can you maybe just pick out a charity fundraising platform that has funding thresholds and refunds (and is used by consequentialists), put Pol.is on top to rank the projects by how uncontroversial they are, and then offer proportional (to the uncontroversiality) refund bonuses to everyone who contributed to failed fundraisers? Maybe they can apply, prove that they’ve contributed to a failed fundraiser, and get the payout? Maybe, if you’re completely independent of the people who put up the fundraisers, you won’t risk getting them in trouble with the SEC?

More philosophically: I think it’s hard to distinguish freeriding from division of labor in economies of scale when there is no explicit coordination. DACs are probably useful in the majority of cases, but there’s a minority of cases where people (usually rather altruistic people) should pick their battles, focus on the stuff where they can make the greatest marginal contribution, and “freeride” on the social change that others effect in other areas. Put differently, I would love a non-speciesist world and a world where funding in AI safety is allocated efficiently. I’m currently working only on the second problem, so one could argue that I’m freeriding on others’ solutions to the first. But in a different sense it’s just a division of labor among people who want to make the world a better place in general. So some seeming freerider problems probably need fixing with DACs, while others might not be problems at all or might be improved with better communication without anyone having to put up a refund bonus. (But money is flexible, so DACs for fundraising are probably robustly useful.)

Awesome effort! Please keep us posted of how it’s going!

Btw., I think it would be useful to mention what the article is about in the title. I would not have read it (even though I’m very interested in the topic) if Dony hadn’t told me that it’s about DACs!

Comment by Dawn Drescher (Telofy) on Report on modeling evidential cooperation in large worlds · 2023-07-13T19:47:22.856Z · LW · GW

Amazing work! So glad it’s finally out in the open!

Comment by Telofy on [deleted post] 2023-05-08T16:17:31.843Z

My perhaps a bit naive take (acausal stuff, other grabby aliens, etc.) is that a conflict needs at least two, and humans are too weak and uncoordinated to be much of an adversary. Hence I’m not so worried about monopolar takeoffs. Not sure, though. Maybe I should be more worried about those too.

Comment by Telofy on [deleted post] 2023-05-06T19:50:57.426Z

I expect that if you make a superintelligence it won’t need humans to tell it the best bargaining math it can use

I’m not a fan of idealizing superintelligences. 10+ years ago that was the only way to infer any hard information about worst-case scenarios. Assume perfect play from all sides, and you end up with a fairly narrow game tree that you can reason about. But now it’s a pretty good guess that superintelligences will be more advanced successors of GPT-4 and such. That tells us a lot about the sort of training regimes through which they might learn bargaining, and what sorts of bargaining solutions they might completely unreflectedly employ in specific situations. We can reason about what sorts of training regimes will instill which decision theories in AIs, so why not the same for bargaining.

If we think we can punt the problem to them, then we need to make sure they reflect on how they bargain and the game theoretic implication of that. We may want to train them to seek out gains from trade like it’s useful in a generally cooperative environment, rather than seek out exploits as it would be useful in a more hostile environment.

If we find that we can’t reliably punt the problem to them, we now still have the chance to decide on the right (or a random) bargaining solution and train enough AIs to adopt it (more than 1/3rd? Just particularly prominent projects?) to make it the Schelling point for future AIs. But that window will close when they (OpenAI, DeepMind, vel sim.) finalize the corpus of the training data for the AIs that’ll take over the world.

I don’t care about wars between unaligned AIs, even if they do often have them

Okay. I’m concerned with scenarios where at least one powerful AI is at least as (seemingly) well aligned as GPT-4.

Secondly, you need to assume that the pessimization of the superintelligence’s values would be bad, but in fact I expect it to be just as neutral as the optimization.

Can you rephrase? I don’t follow. It’s probably “pessimization” that throws me off?

why would either of them start the war?

Well, I’m already concerned about finite versions of that. Bad enough to warrant a lot of attention in my mind. But there are different reasons why that could happen. The one that starts the war could’ve made any of a couple different mistakes in assessing their opponent. It could make mistakes in the process of readying its weapons. Finally, the victim of the aggression could make mistakes assessing the aggressor. Naturally, that’s implausible if superintelligences are literally so perfect that they cannot make mistakes ever, but that’s not my starting point. I assume that they’re going to be about as flawed as the NSA, DoD, etc., only in different ways.

Comment by Telofy on [deleted post] 2023-05-06T19:19:52.014Z

Sorry for glossing over some of these. E.g., I’m not sure if you consider ems to be “scientifically implausible technologies.” I don’t, but I bet there are people who could make smart arguments for why they are far off.

Reason 5 is actually a reason to prioritize some s-risk interventions. I explain why in the “tractability” footnote.

Comment by Telofy on [deleted post] 2023-05-06T19:16:25.685Z

Woah, thanks! I hadn’t seen it!

Comment by Telofy on [deleted post] 2023-05-06T19:14:11.377Z

No, just a value-neutral financial instrument such as escrow. If two people can fight or trade, but they can’t trade, because they don’t trust each other, they’ll fight. That loses out on gains from trade, and one of them ends up dead. But once you invent escrow, there’s suddenly, in many cases, an option to do the trade after all, and both can live!

Comment by Telofy on [deleted post] 2023-05-06T19:01:21.601Z

I’ve thought a bunch about acausal stuff in the context of evidential cooperation in large worlds, but while I think that that’s super important in and of itself (e.g., it could solve ethics), I’d be hard pressed to think of ways in which it could influence thinking about s-risks. I rather prefer to think of the perfectly straightforward causal conflict stuff that has played out a thousand times throughout history and is not speculative at all – except applied to AI conflict.

But more importantly it sounds like you’re contradicting my “tractability“ footnote? In it I argue that if there are solutions to some core challenges of cooperative AI – and finding them may not be harder than solving technical alignment – then there is no deployment problem: You can just throw the solutions out there and it’ll be in the self-interest of every AI, aligned or not, to adopt them.

Comment by Telofy on [deleted post] 2023-05-06T14:24:07.556Z

I'm confused what you're saying, and curious. I would predict that this attitude toward suicide would indeed correlate with being open to discussing S-risks. Are you saying you have counter-data, or are you saying you don't have samples that would provide data either way?

I was just agreeing. :-3 In mainstream ML circles there is probably a taboo around talking about AI maybe doing harm or AI maybe ending up uncontrollable etc. Breaking that taboo was, imo, a good thing because it allowed us to become aware of the dangers AI could pose. Similarly, breaking a taboo around talking about things worse than death can be helpful to become aware of ways in which we may be steering toward s-risks.

It's basically like this 

I see! I have a bunch of friends who would probably consider their lives not worth living. They often express the wish to not have been born or at least consider their current well-being level to be negative. But I think only one of them might be in such a negative feedback loop, and I’m probably misdiagnosing her here. Two of them are bedridden due to Long Covid and despite their condition have amassed a wealth of knowledge on virus-related medicine, probably by googling things on their phones while lying down for ten minutes at a time. Others have tried every depression drug under the sun. Other have multiple therapists. They are much more held back by access and ability than by motivation, even though motivation is probably also hard to come by in that state.

Surely you can see that this isn't common, and the normal response is to just be broken until you die.

Idk, Harold and Maude is sort of like that. I’ve actually done a back-of-the-envelope calculation, which is perhaps uncommon, but the general spirit of the idea seems normal enough to me? Then again I could easily be typical-minding.

Comment by Telofy on [deleted post] 2023-05-06T14:00:00.434Z

I’d prefer to keep these things separate, i.e. (1) your moral preference that “a single human death is worse than trillions of years of the worst possible suffering by trillions of people” and (2) that there is a policy-level incentive problem that implies that we shouldn’t talk about s-risks because that might cause a powerful idiot to take unilateral action to increase x-risk.

I take it that statement 1 is a very rare preference. I, for one, would hate for it to be applied to me. I would gladly trade any health state that has a DALY disability weight > 0.05 or so for a reduction of my life span by the same duration. I’m not saying that you shouldn’t live forever, but I only want to if my well-being is sufficiently high (around or a bit higher than my current level).

Statement 2 is more worrying to me if taken at face value – but I’m actually not so worried about it in practice. What’s much more common is that people seek power for themselves. Some of them are very successful with it – Ozymandias, Cyrus the Great, Alexander the Great, Jesus, Trajan, … – but they are so much fewer than all the millions and millions of narcissistic egomaniacs that try. Our civilization seems to be pretty resilient against such power grabs.

Corollary: We should keep our civilization resilient. That’s equally important to me because I wouldn’t want someone to assume power and undemocratically condemn all of us to hell to eke out the awful kind of continued existence that comes with it.

Comment by Telofy on [deleted post] 2023-05-06T13:29:26.074Z

Huh, thanks! 

Comment by Telofy on [deleted post] 2023-05-06T13:27:59.164Z

The example I was thinking of is this one. (There’s a similar thread here.) So in this case it’s the first option – they don’t think they’ll prefer death. But my “forever” was an extrapolation. It’s been almost three years since I read the comment.

I’m the ECL type of intersubjective moral antirealist. So in my mind, whether they really want what they want is none of my business, but what that says about what is desirable as a general policy for people we can’t ask is a largely empirical question that hasn’t been answered yet. :-3

Comment by Telofy on [deleted post] 2023-05-06T13:11:22.916Z

That sounds promising actually… It has become acceptable over the past decade to suggest that some things ought not to be open-sourced. Maybe it can become acceptable to argue for DRM for certain things too. Since we don’t yet have brain scanning technology, I’d also be interested in an inverse cryonics organization that has all the expertise to really really really make sure that your brain and maybe a lot of your social media activity and whatnot really gets destroyed after your death. (Perhaps even some sorts of mechanism by which suicide and complete scrambling is triggered automatically the second humanity loses control – but that seems infeasibly risky and hard to construct.)

To clarify, I don’t believe in identity, so this does not actually do much useful work directly, but it could find demand, and it push open the Overton window a bit to allow for more discussion of how we really want to protect em-relevant data at scale. It’s probably all too slow though.

Comment by Telofy on [deleted post] 2023-05-06T13:01:38.084Z

Yeah, that’s a known problem. I don’t quite remember what the go-to solutions where that people discussed. I think creating an s-risks is expensive, so negating the surrogate goal could also be something that is almost as expensive… But I imagine an AI would also have to be a good satisficer for this to work or it would still run into the problem with conflicting priorities. I remember Caspar Oesterheld (one of the folks who originated the idea) worrying about AI creating infinite series of surrogate goals to protect the previous surrogate goal. It’s not a deployment-ready solution in my mind, just an example of a promising research direction.

Comment by Telofy on [deleted post] 2023-05-06T12:49:49.181Z

In the tractability footnote above I make the case that it should be at least vastly easier than influencing the utility functions of all AIs to make alignment succeed.

Comment by Telofy on [deleted post] 2023-05-05T17:31:16.335Z

Interesting take! 

Friend circles of mine – which, I should note, don’t to my knowledge overlap with the s-risks from AI researchers I know – do treat suicide as a perfectly legitimate thing you can do after deliberation, like abortion or gender-affirming surgery. So there’s no particular taboo there. Hence, maybe, why I also don’t recoil from considering that the future might be vastly worse than the present.

But it seems to be like a rationalist virtue not to categorically recoil from certain considerations.

Could you explain the self-fulfilling prophesy idea more, though? School was bad for me, but since then I’ve been hoping to live long enough with net positive valence to outweigh that time rather than trying to sabotage myself as a results. Then again it could be that there is some more complicated mechanism at work underneath, e.g., that oppression causes both thanatos (death wish) and low self-esteem, and that low self-esteem leads one to think that one doesn’t deserve good things if they come at a cost, so that things get worse and the thanatos increases. But in that example thanatos is a concomitant – there is no chain of causal arrows from thanatos to more thanatos…

Comment by Telofy on [deleted post] 2023-05-05T16:38:33.435Z

Thx! Yep, your edit basically captures most of what I would reply. If alignment turns out so hard that we can’t get any semblance of human values encoded at all, then I’d also guess that hell is quite unlikely. But there are caveats, e.g., if there is a nonobvious inner alignment failure, we could get a system that technically doesn’t care about any semblance of human values but doesn’t make that apparent because ostensibly optimizing for human values appears useful for it at the time. That could still cause hell, even with a higher-than-normal probability.

Comment by Telofy on [deleted post] 2023-05-05T16:01:44.050Z

Thanks for linking that interesting post! (Haven’t finished it yet though.) Your claim is a weak one though, right? Only that you don’t expect the entirely lightcone of the future to be filled with worst-case hell, or less than 95% of it? There are a bunch of different definitions of s-risk, but what I’m worried about definitely starts at a much smaller-scale level. Going by the definitions in that paper (p. 3 or 391), maybe the “astronomical suffering outcome” or the “net suffering outcome.”

Comment by Telofy on [deleted post] 2023-05-05T15:31:18.242Z

Interesting take! Obviously that’s different for me and many others, but you’re not alone with that. I even know someone who would be ready to cook in a lava lake forever if it implies continuing to exist. I think that’s also in line with the DALY disability weights, but only because they artificially scale them to the 0–1 interval.

So I imagine you’d never make such a deal as shortening you life by three hours in exchange for not experiencing one hour of the worst pain or other suffering you’ve experienced?

Comment by Telofy on [deleted post] 2023-05-05T15:24:45.852Z

but some wonkier approaches could be pretty scary.

Yeah, very much agreed. :-/

in particular, an aligned AI sells more of its lightcone to get baby-eating aliens to eat their babies less, and in general a properly aligned AI will try its hardest to ensure what we care about (including reducing suffering) is satisfied, so alignment is convergent to both.

Those are some good properties, I think… Not quite sure in the end.

But your alignment procedure is indirect, so we don’t quite know today what the result will be, right? Then the question whether we’ll end up on an s-line depends on all the tons of complexity that usually comes with games with many participants. In this case the s-line results from the goals of another agent who is open to trade (hasn’t irrevocably committed). But there are many other paths to s-lines. (Am I using the -line nomenclature correctly? First time I heard about it. What are p-lines?)

(note that another reason i don't think about S-risks too much is that i don't think my mental health could handle worrying about them a lot, and i need all the mental health i can get to solve alignment.)

In my experience, the content of what one thinks about gets abstracted away at some point so that you cease to think about the suffering itself. Took about 5 years for me though… (2010–15)

Comment by Telofy on [deleted post] 2023-05-05T14:51:02.469Z

Some promising interventions against s-risks that I’m aware of are:

  1. Figure out what’s going on with bargaining solutions. Nash, Kalai, or Kalai-Smorodinsky? Is there one that is privileged in some impartial way? 
  2. Is there some sort of “leader election” algorithm over bargaining solutions?
  3. Do surrogate goals work, are they cooperative enough?
  4. Will neural-net based AIs be comprehensible to each other, if so, what does the open source game theory say about how conflicts will play out?
  5. And of course CLR’s research agenda.

Interpretability research is probably interesting for both, but otherwise I don’t see a lot of overlap in research topics. Maybe comparing the CLR research agenda against an alignment research agenda could help quantify the overlap a bit more. (There is probably a lot of transferability in the required skills though – game theory, ML, etc.)

Comment by Telofy on [deleted post] 2023-05-05T14:37:09.486Z

I also know plenty of cheerful ones. :-3

Comment by Telofy on [deleted post] 2023-05-05T14:36:24.513Z

Interesting. Do I give off that vibe – here or in other writings?

Comment by Telofy on [deleted post] 2023-05-04T22:40:55.107Z

Thx! I’ll probably drop the “more heavily” for stylistic reasons, but otherwise that sounds good to me!

Comment by Telofy on [deleted post] 2023-05-04T22:38:51.515Z

I suppose my shooting range metaphor falls short here. Maybe alignment is like teaching a kid to be an ace race car driver, and s-risks are accidents on normal roads. There it also depends on the details whether the ace race car driver will drive safely on normal roads.

Comment by Telofy on [deleted post] 2023-05-04T22:33:54.973Z

Oh, true! Digital sentience is also an important point! A bit of an intuition pump is that if you consider a certain animal to be sentient (at least with some probability), then an em of that animal’s brain may be sentient with a similar probability. If an AI is powerful enough to run such ems, the question is no longer whether digital sentience is possible but why an AI would run such an em.

The Maslow hierarchy is reverse for me, i.e. rather dead/disempowered than being tortured, but that’s just a personal thing. In the end it’s more important what the acausal moral compromise says, I think.

Comment by Telofy on [deleted post] 2023-05-04T22:24:18.443Z

Good point. I can still change it. What title would you vote for? I spent a lot of time vacillating between titles and don’t have a strong opinion. These were the options that I considered:

  1. Why not s-risks? A poll.
  2. Why are we so complacent about AI hell?
  3. Why aren’t we taking s-risks from AI more seriously?
  4. Why do so few people care about s-risks from AI?
  5. Why are we ignoring the risks of AI hell?
  6. What’s holding us back from addressing s-risks from AI?
  7. Why aren’t we doing more to prevent s-risks from AI?
  8. What will it take to get people to care about s-risks from AI?
Comment by Telofy on [deleted post] 2023-05-04T19:16:10.750Z

I agree with what Lukas linked. But there are also various versions of the Waluigi Effect, so that alignment, if done wrong, may increase s-risk. Well, and I say in various answers and the in post proper that I’m vastly more optimistic about reducing s-risk than having to resort to anything that would increase x-risk.

Comment by Telofy on [deleted post] 2023-05-04T19:10:23.444Z
Comment by Telofy on [deleted post] 2023-05-04T19:08:42.686Z

Yeah… When it comes to the skill overlap, having alignment research aided by future pre-takeoff AIs seems dangerous. Having s-risk research aided that way seems less problematic to me. That might make it accessible (now or in a year) for people who have struggled with alignment research. I also wonder whether there is maybe still more time for game-theoretic research in s-risks than three is in alignment. The s-risk-related problems might be easier, so they can perhaps still be solved in time. (NNTR, just thinking out loud.)

Comment by Telofy on [deleted post] 2023-05-04T18:29:35.741Z

Oooh, good point! I’ve certainly observed that in myself in other areas.

Like, “No one is talking about something obvious? Then it must be forbidden to talk about and I should shut up too!” Well, no one is freaking out in that example, but if someone were, it would enhance the effect.

Comment by Telofy on [deleted post] 2023-05-04T17:50:27.420Z

Here are some ways to learn more: “Coordination Challenges for Preventing AI Conflict,” “Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda,” and Avoiding the Worst (and s-risks.org).

Comment by Telofy on [deleted post] 2023-05-04T17:50:10.710Z

Too unknown. Finally there’s the obvious reason that people just don’t know enough about s-risks. That seems quite likely to me.

Comment by Telofy on [deleted post] 2023-05-04T17:49:53.216Z

Too unpopular. Maybe people are motivated by what topics are in vogue in their friend circles, and s-risks are not?

Comment by Telofy on [deleted post] 2023-05-04T17:49:36.037Z

Personal fit. Surely, some people have tried working on s-risks in different roles for some substantial period of time but haven’t found an angle from which they can contribute given their particular skills.

Comment by Telofy on [deleted post] 2023-05-04T17:49:03.853Z

There have been countless discussions of takeoff speeds. The slower the takeoff and the closer the arms race, the greater the risk of a multipolar takeoff. Most of you probably have some intuition of what the risk of a multipolar takeoff is. S-risk is probably just 1/10th of that – wild guess. So I’m afraid that the risk is quite macroscopic.

The second version ignores the expected value. I acknowledge that expected value calculus has its limitations, but if we use it at all, and we clearly do, a lot, then there’s no reason to ignore its implications specifically for s-risks. With all ITN factors taken together but ignoring probabilities, s-risk work beats other x-risk work by a factor of 10^12 for me (your mileage may vary), so if it’s just 10x less likely, that’s not decisive for me.

I don’t have a response to the third version.

Comment by Telofy on [deleted post] 2023-05-04T17:48:37.837Z

Too unlikely. I’ve heard three versions of this concern. One is that s-risks are unlikely. I simply don’t think it is as explained above, in the post proper. The second version is that it’s 1/10th of extinction, hence less likely, hence not a priority. The third version of this take is that it’s just psychologically hard to be motivated for something that is not the mode of the probability distribution of how the future will turn out (given such clusters as s-risks, extinction, and business as usual). So even if s-risks are much worse and only slightly less likely than extinction, they’re still hard for people to work on.

Comment by Telofy on [deleted post] 2023-05-04T17:48:07.381Z

That sounds to me like, “Don’t talk about gun violence in public or you’ll enable people who want to overthrow the whole US constitution.” Directionally correct but entirely disproportionate. Just consider that non-negative utilitarians might hypothetically try to kill everyone to replace them with beings with greater capacity for happiness, but we’re not self-censoring any talk of happiness as a result. I find this concern to be greatly exaggerated.

In fact, moral cooperativeness is at the core of why I think work on s-risks is a much stronger option than alignment, as explained in the tractability section above. So concern for s-risks could even be a concomitant of moral cooperativeness and can thus even counter any undemocratic, unilateralist actions by one moral system.

Note also that there is a huge chasm between axiology and morality. I have pretty strong axiological intuitions but what morality follows from that (even just assuming the axiology axiomatically – no pun intended) is an unsolved research question that would take decades and whole think tanks to figure out. So even if someone values empty space over earth today, they’re probably still not omnicidal. The suffering-focused EAs I know are deeply concerned about the causal and acausal moral cooperativeness of their actions. (Who wants to miss out on moral gains from trade after all!) And chances are this volume of space will be filled by some grabby aliens eventually, so assured permanent nonexistence is not even on the table.

Comment by Telofy on [deleted post] 2023-05-04T17:47:35.934Z

NNTs. Some might argue that “naive negative utilitarians that take ideas seriously” (NNTs) want to destroy the world, so that any admissions that s-risks are morally important in expectation should happen only behind closed doors and only among trusted parties.

Comment by Telofy on [deleted post] 2023-05-04T17:46:33.916Z

I want to argue with the Litany of Gendlin here, but what work on s-risks really looks like in the end is writing open source game theory simulations and writing papers. All try academic stuff that makes it easy to block out thoughts of suffering itself. Just give it a try! (E.g., at a CLR fellowship.)

I don’t know if that’s the case, but s-risks can be reframed:

  1. We want to unlock positive-sum trades for the flourishing of our descendants (biological or not).
  2. We want to distribute the progress and welfare gains from AI equitably (i.e. not have some sizable fractions of future beings suffer extremely).
  3. Our economy only works thanks to trust in institutions and jurisprudence. The flourishing of the AI economy will require that new frameworks be developed that live up to the challenges of the new era!

These reframings should of course be followed up with a detailed explanation so as not to be dishonest. Their purpose is just to show that one can pivot one’s thinking about s-risks such that the suffering is not so front and center. This would, if anything, reduce my motivation to work on them, but that’s just me.