The idea is, normally just do straightforwardly good things. Be cooperative, friendly, and considerate. Embrace the standard virtues. Don't stress about the global impacts or second-order altruistic effects of minor decisions. But also identify the very small fraction of your decisions which are likely to have the largest effects and put a lot of creative energy into doing the best you can.
I agree with this, but would add that IMO, after you work out the consequentialist analysis of the small set of decisions that are worth intensive thought/effort/research, it is quite worthwhile to additionally work out something like a folk ethical account of why your result is correct, or of how the action you're endorsing coheres with deep virtues/deontology/tropes/etc.
There are two big upsides to this process:
IMO, this is similar to: in math, we use heuristics and intuitions and informal reasoning a lot, to guess how to do things -- and we use detailed, not-condensed-by-heuristics algebra or mathematical proof steps sometimes also, to work out how a thing goes that we don't yet find intuitive or obvious. But after writing a math proof the sloggy way, it's good to go back over it, look for "why it worked," "what was the true essence of the proof, that made it tick," and see if there is now a way to "see it at a glance," to locate ways of seeing that will make future such situations more obvious, and that can live in one's system 1 and aesthetics as well as in one's sloggy explicit reasoning.
Or, again, in coding: usually we can use standard data structures and patterns. Sometimes we have to hand-invent something new. But after coming up with the something new: it's good, often, to condense it into readily parsable/remember-able/re-useable stuff, instead of hand spaghetti code.
I don't have my concepts well worked out here yet, which is why I'm being so long-winded and full of analogies. But I'm pretty sure that folk ethics, where we have it worked out, has a bunch of advantages over consequentialist reasoning that're kind of like those above.
For a concrete example:
AI safety obviously matters. The folk ethics of "don't let everyone get killed if you can help it" are solid, so that part's fine. But in terms of tactics: I really think we need to work out a "commonsense good" or "folk ethics" type account of things like:
This sounds similar to whether a contemporary machine learning model can break a cryptographic cipher, a hash function, or something like that.nothoughtsheadempty on lunaiscoding's Shortform
Hello, I'm working on mechanistic interpretability alongside technical alignment. I'm interested in AI governance and how technical work can inform policy.philh on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem
I agree that I have not stated a proper disclaimer, which was quite warranted, given the thrust of the post.
To clarify: it's not clear to me whether you think it would have been warranted of you to give a disclaimer, or whether you think not-giving it was warranted?
My only intended point was that, while a lot of people do veganism wrong and some are not suited to it at all, an average person can be vegan without adverse health effects, as long as they eat varied and enriched plant-based diet and periodically check their vitamins/nutrients/minerals levels and make dietary adjustments as necessary.
Your original comments said nothing about periodically checking levels and making adjustments. So if that was part of your intended point, you executed your intent very poorly. (My guess would be that four months later, you misremember your epistemic state at the time. Which isn't a big deal. Whatever the reason for the omission, the omission seems worth noting.)
(I separately think it's worth noting that your comment sounds like you're observing "my partner does X" and concluding "the average person can do X", which is obviously not good reasoning. Like, maybe you have more evidence than you're letting on, but the natural-to-me reading of your post is that you're making a basic error.)wbrom42-gmail-com on My Effortless Weightloss Story: A Quick Runthrough
Thanks for the replydawn-drescher on AI Safety Impact Markets: Your Charity Evaluator for AI Safety
Awww, thanks for the input!
I actually have two responses to this, one from the perspective of the current situation – our system in phase 1, very few donors, very little money going around, most donors don't know where to donate – and the final ecosystem that we want to see if phase 3 comes to fruition one day – lots of pretty reliable governmental and CSR funding, highly involved for-profit investors, etc.
The second is more interesting but also more speculative. The diagram here [LW · GW], shows both the verifier/auditor/evaluator and the standardization firms. I see the main responsibility with the standardization firms, and that's also where I would like my company to position itself if we reach that stage (possibly including the verification part).
One precedent for that is the Impact Genome. It currently recognizes (by my latest count) 176 kinds of outcomes. They are pretty focused on things that I would class as deploying solutions in global development, but they're already branching out into other fields as well. Extend that database with outcomes like different magnitudes of career plan changes (cf. 80,000 Hours), years of dietary change, new and valuable connections between collaborators, etc., and you'll probably end up with a database of several hundred outcome measures, most of which are not just about publishing in journals. (In the same section I mention some other desiderata that diverge a bit from how the Impact Genome is currently used. That article is generally the more comprehensive and interesting one, but for some reason it got fewer upvotes.)
In this world there's also enough financial incentive for project developers to decide what they want to do based on what is getting funded, so it's important to set sensible incentives.
It's possible that even in this world there'll be highly impactful and important things to do that'll somehow slip through the cracks. Absent cultural norms around how to attribute the effects of some more obscure kind of action, it might lead to too many court battles to even attempt to monetize it. I'm thinking of tricky cases that are all about leveraging the actions of others, e.g., when doing vegan outreach work. Currently there are no standards for how to attribute such work (how much reward should the leaflet designer get, how much should the activist get, how much should the new vegan or reducitarian get). But over time more and more of those will probably get solved as people agree on arbitrary assignments. (Court battles cost a lot of money, and new vegans will not want to financially harm the people who first convinced them to go vegan, so the activist and the leaflet designer are probably in good positions to monetize their contributions, and just have to talk to each other how to split the spoils.)
But we're so so far away from that world.
In the current world I see three reasons for our current approach:
First, my rough impression from the projects on our platform that I know better is that for them it's mostly that they're, by default, not getting any funding or just some barely sufficient baseline funding from their loyal donors. With Impact Markets, they might get a bit of money on top. The loyal donors are probably usually individuals with personal ties to the founders. The funding that they can get on top is thanks to their published YouTube videos, blog articles, conference talks, etc. So one funding source is thanks to friendships; the other is thanks to legible performance. But there's no funding from some large donor who is systematically smarter and more well-connected than our evaluators + project scout network.
And even really smart funders like Open Phil will look at legible things like the track record of a project developer when making their grant recommendations. If the project developer has an excellent track record of mentioning just the right people and topics to others at conferences, then no one, not Open Phil or even the person themselves will be able to take that into account because of how illegible it is.
Second, we're probably embedded in different circles (I'm guessing you're more thinking of academic researchers at university departments where they can do AI safety research?), but in my AI safety circles there are the people who have savings from their previous jobs that they're burning through, maybe some with small LTFF grants, and some that support each other financially or with housing. So by and large it's either they get a bit of extra money through Impact Markets and can continue their work for another quarter or they drop out of AI safety work and go back to their industry jobs. So even if we had enough funding for them, it would just prevent them from going back to unrelated work for a bit longer, not change what they're doing within AI safety.
A bit surprisingly, maybe, one of our biggest donors on the platform is explicitly using it to look for projects that push for a pause or moratorium on AGI development [LW · GW], largely though public outreach. That can be checked by evaluators through newspaper reports on the protests, and other photos and videos, but it'll be unusually opaque how many people they reached, whether any of them were relevant, and what they took away from it. So far our track record seems to be to foster rather illegible activism rather than distract from it, though admittedly that has happened a bit randomly – Greg is just really interested in innovative funding methods.
Third, currently the incentives are barely enough to convince project developers to spend 5 minutes to post their existing proposals to our platform, and only in some cases. (In others I've posted the projects for them and then reassigned them to their accounts.) They are not enough to cause project developers to make sure that they have the participants' permission to publish (or share with external evaluators) the recordings of their talks. They're not enough for them to design feedback surveys that shed light on how useful an event was to the participants. (Unless they already have them for other reasons.)
And it makes some sense too: We've tracked $391,000 in potential donations from people who want to use our platform; maybe 10% of those will follow through; divide that by the number of projects (50ish), and the average project can hope for < $1,000. (Our top projects can perhaps hope for $10k+ while the tail projects can probably not expect to fundraise anything, but the probability distribution math is too complicated for me right now. Some project developers might expect a Pareto distribution where they'd have to get among the top 3 or so for it to matter at all; others might expect more of a log-normal distribution.) Maybe they're even more pessimistic than I am in their assumptions, so I can see that any change that would require a few hours of work does not seem worth it to them at the moment.
If we become a bit more successful in building momentum behind our platform, maybe we can attract 100+ donors with > $1 million in total funding, so that we can present a stronger incentive for project developers. But even then I think what would happen is rather that they'll do such things as design feedback surveys to share with evaluators or record unconference talks to share them etc., but not to fundamentally change what they're doing to make it more provable.
So I think if we scale up by 3–4 orders of magnitude, we'll probably still do a bit better with our system than existing funders (in terms of scaling down, while having similarly good evaluations), but then we'll need to be careful to get various edge cases right. Though even then I don't think mistakes will be path dependent. If there is too little funding for some kind of valuable work, and the standardization firms find out about it, they can design new standards for those niches.
I hope that makes sense to you (and also lunatic_at_large), but please let me know if you disagree with any of the assumptions and conclusion. I see for example that even now, post-FTX, people are still talking about a talent constraint (rather than funding constraint) in AI safety, which I don't see at all. But maybe the situation is different in the US, and we should rebrand to impactmarkets.eu or something! xDtranshumanist_atom_understander on Conditionals All The Way Down
I think this thought has analogues in Bayesian statistics.
We choose a prior. Let's say, for the effect size of a treatment. What's our prior? Let's say, Gaussian with mean 0, and standard deviation equal to the typical effect size for this kind of treatment.
But how do we know that typical effect size? We could actually treat this prior as a posterior, updated from a uniform prior by previous studies. This would be a Bayesian meta-analysis.
I've never seen anyone formally do a meta-analysis just to get a prior. At some point, you decide your assumed probability distributions are close enough, that more effort wouldn't change the final result. Really, all mathematical modeling is like this. We model the Earth as a point, or a sphere, or a more detailed shape, depending on what we can get away with in our application. We make this judgment informally, but we expect a formal analysis to back it up.
As for these ranges and bounds... that reminds me of the robustness analysis they do in Bayesian statistics. That is, vary the prior and see how it effects the posterior. Generally done within a parametric family of priors, so you just vary the parameters. The hope is that you get about the same results within some reasonable range of priors, but you don't get strict bounds.neel-nanda-1 on Interpretability Externalities Case Study - Hungry Hungry Hippos
The rough TLDR of my opinion is that interp seems reasonably far from being dangerous to capabilities, and there's a ton of work to be done, and I think the value of building a thriving academic field around it seems super high, and I think significant barriers to publishing and sharing work is toxic to this, and IMO this consideration dominates over some capabilities externalities. I also think that getting capabilities people to pivot from whatever they're doing now to mech interp seems like a win, it's not clear this should be more capabilities-y than the counterfactual, but clearly better for alignment in expectation. I also think junior people get way too stressed about this and should chill out, since your first piece of work is highly unlikely to have any effects, positive or negativeben-lang on The ethic of hand-washing and community epistemic practice
A version of this kind of thinking is central to the book "Beginning of Infinity" by David Deutsch.
His thinks that most ideas (or memes) can be roughly categorized into those that are (1) vulnerable to criticism/testing, and (2) those that can survive critical evaluation. In an environment where people blindly accept what they are told by authorities then nothing favours the ideas that survive criticism, and therefore they are unlikely to survive - except maybe by coincidence. Instead ideas that perform well on other metrics (such as "seeming important" or "making me look good when I express them") will take over.
In contrast, if we change to an environment where ideas are criticised. Where people perform experiments or doubt the word of authorities, then the ideas that can survive criticism will have a survival advantage over their competition and spread.
David Deutsch argues that the first kind of society (word from authority, the way things have always been done etc) has dominated most human societies for most of history, but that we are currently in a wonderfully exceptional time (starting around the enlightenment) where criticism, argument and experiment are accepted.
So, in his view, I think the "ideas handwashing" is criticising and questioning ideas.mondsemmel on Linkpost: They Studied Dishonesty. Was Their Work a Lie?
What a quote:
At the end of Simmons’s unpublished post, he writes, “An influential portion of our literature is effectively a made-up story of human-like creatures who are so malleable that virtually any intervention administered at one point in time can drastically change their behavior.” He adds that a “field cannot reward truth if it does not or cannot decipher it, so it rewards other things instead. Interestingness. Novelty. Speed. Impact. Fantasy. And it effectively punishes the opposite. Intuitive Findings. Incremental Progress. Care. Curiosity. Reality.”