Posts
Comments
I am glad to hear you enjoyed the paper and that our conversation has inspired you to work more on this issue! As I mentioned I now find the worries you lay out in the first paragraph significantly more pressing, thank you for pointing them out!
I do not think this follows, the "consensus" is that sentience is sufficient for moral status. It is not clearly the case that giving some moral consideration to non-human sentient beings would lead to the scenario you describe. Though see: https://www.tandfonline.com/doi/full/10.1080/21550085.2023.2200724
These are great points, thank you!
Remember that what the SCEV does is not directly that which the individuals included in it directly want, but what they would want after an extrapolation/reflection process that converged in the most coherent way possible. This means that almost certainly, the result is not the same as if there were no extrapolation process. If there were no extrapolation process, one real possibility is that something like what you suggest, such as sentient dust mites or ants taking over the utility function would indeed occur. But with extrapolation it is much less clear, that the models of the ants' extrapolated volition may want to uplift the actual ants to a super-human level, as might our models of human extrapolated volition want to do with us humans. Furthermore given that SCEV would try to maximize coherence between satisfying the various volitions of the included beings, the superintelligence would cause human extinction or similar, only if it were physically impossible for it, independently of how much it was able to self-improve, to cause a more coherent result that respected more humans volitions, this seems unlikely, but is not impossible, so this is something to worry about if this proposal where implemented.
However, importantly, in the paper, I DO NOT argue that we should implement SCEV instead of CEV. I only argue that we have some strong (pro-tanto) reasons to do so, even if we should not ultimately do so, because there are other even stronger (pro-tanto) reasons against doing so. This is why I say this in the conclusion: "In this paper, I have shown why we have some very strong pro-tanto reasons in favour of implementing SCEV instead of CEV. This is the case even if, all things considered, it is still ultimately unclear whether what is best is to try to implement SCEV or another proposal more similar to CEV."
This is truly what I believe and this is why I have put this conclusion in the paper instead of one that states that we SHOULD implement SCEV, I believe this is wrong and thus I did not put it, even if it would have made the paper less complex and more well-rounded.
I completely agree with you and with the quote, that getting this right is a matter of both vital importance and urgency, and I take this and the possibility of human extinction and s-risks very seriously when conducting my research, it is precisely because of this that I have shifted from doing standard practical/animal ethics to this kind of research. It is great that we can agree on this. Thanks again for your thought-provoking comments, they have lowered my credence in favour of implementing SCEV all things considered (even if we do have the pro-tanto reasons I present in the paper).
What I mean by "moral philosophy literature" is the contemporary moral philosophy literature, I should have been more specific, my bad. And in contemporary philosophy, it is universally accepted (though of course, the might exist one philosopher or another who disagrees) that sentience in the sense understood above as the capacity of having positively or negatively valenced phenomenally conscious experiences is sufficient for moral patienthood. If this is the case, then, it is enough to cite a published work or works in which this is evident. This is why I cite Clarke, S., Zohny, H. & Savulescu, J., 2021. You can go see this recently edited book on moral status that this claim is assumed thought and in the book you can find the sources for its justification.
Thank you! I will for sure read these when I have time. And thank you for your comments!
Regarding how to take into account the interests of insects and other animals/digital minds see this passage I have to exclude form publication: [SCEV would apply an equal consideration of interests principle] "However, this does not entail that, for instance, if there is a non-negligible chance that dust mites or future large language models are sentient, the strength of their interests should be weighted the same as the strength of the interests of entities that we have good reasons to believe that it is very likely that they are sentient. The degree of consideration given to the interests or the desires of each being included in the extrapolation base should plausibly be determined by how likely it is that they have such morally relevant interests as a result of being sentient. We should apply something along the lines of Jeff Sebo’s Expected value principle, which is meant to determine the moral value of a given entity in cases of uncertainty about whether or not it is sentient (Sebo, 2018). In determining to what extent the interests, preferences and goals of a given entity (whose capacity for sentience we are uncertain about) should be included in the extrapolation base of SCEV, we should first come up with the best and most reliable credence available about whether the entity in question has morally relevant interests as a result of being sentient. And then we should multiply this credence by the strength (i.e. how bad it would be that those interests were frustrated/how good it would be that they were satisfied) that those interests would have if they were morally relevant as a result of the entity being sentient. The product of this equation should be the extent to which these interests are included in the extrapolation base. When determining our credence about whether the entity in question has morally relevant interests as a result of being sentient, we should also take into account the degree to which we have denied the existence of morally relevant interests to sentient beings different from us in the past. And we should acknowledge the biases present in us against reasonably believing in the extent to which different beings possess capacities that we would deem morally relevant. "
Regarding intervening in ecosystems, and how to balance the interests/preferences of different animals, I expect that unless the extrapolated volition of non-human animals chose/prefer that the actual animals are uplifted, something like this is what they would prefer: https://www.abolitionist.com/?_gl=1*1iqpkhm*_ga*NzU0NDU1ODY0LjE3MDI5MjUzNDY.*_ga_1MVBX8ZRJ9*MTcwMjkyNTM0NS4xLjEuMTcwMjkyNTUwOS4wLjAuMA.. It does not seem morally problematic to intervene in nature etc, and I believe ether are good arguments to defend this view.
I am arguing that given that
1. (non-human animals deserve moral consideration, and s-risk are bad (I assume this))
We have reasons to believe 2: (we have some pro-tanto reasons to include them in the process of value learning of an artificial superintelligence instead of only including humans).
There are people (whose objections I address in the paper) that accept 1 but do not accept 2. 1 is not justified for the same reasons as 2. 2 is justified for the reasons I present in the paper. 1 is justified by other arguments about animal ethics and the badness of suffering that are intentionally not present in the paper, I cite the places/papers where 1 is argued instead of arguing for it myself in the paper which is standard practice in academic philosophy.
The people who believe 1 but not 2, do not only have different feelings than me, but their objections to my view are (very likely) wrong, as I show when responding to those objections in the objections section.
Hi Roger, first, the paper is addressed to those who already do believe that all sentient beings deserve moral consideration and that their suffering is morally undesirable. I do not argue for these points in the paper, since they are already universally accepted in the moral philosophy literature.
This is why, for instance, write the following: "sentience in the sense understood above as the capacity of having positively or negatively valenced phenomenally conscious experiences is widely regarded and accepted as a sufficient condition for moral patienthood (Clarke, S., Zohny, H. & Savulescu, J., 2021)".
Furthermore, it is just empirically not the case that people cannot be convinced "only by ethics and logic": for instance, many people reading Peter Singer's Animal Liberation, as a result, changed their views in light of the arguments he provided in the first chapter and came to believe that non-human animals deserve equal moral consideration of interests. Changing one's ethical views when presented with ethical arguments is a standard practice that occurs to moral philosophers when researching and reading moral philosophy. Of course, there is the is/ought to gap, but this does not entail that one cannot convince someone that the most coherent version of their most fundamental ethical intuitions do not, in fact, lead where they believe they lead but instead that they lead to somewhere else, to a different conclusion. This happens all the time between more philosophers, one presents an argument in favour of a view, and in many instances, many philosophers are convinced by that argument and change their view.
In this paper, I was not trying to argue that non-human animals deserve moral consideration or that s-risks are bad, as I said, I have assumed this. What I try to argue is that if this is true, then, in some decision-making situations we would have some strong pro-tanto moral reasons to implement SCEV. In fact, I do not even argue that conclusively, what we should do is try to implement SCEV.
Yes, and - other points may also be relevant:
(1) Whether there are possible scenarios like these in which the ASI cannot find a way to adequately satisfy all the extrapolated volition of the included beings is not clear. There might not be any such scenarios.
(2) If these scenarios are possible, it is also not clear how likely they are.
(3) There is a subset of s-risks and undesirable outcomes (those coming from cooperation failures between powerful agents) that are a problem to all ambitious value-alignment proposals, including CEV and SCEV.
(4) In part, because of 3, the conclusion of the paper is not that we should implement SCEV if possible all things considered, but rather that we have some strong pro-tanto reasons in favour of doing so. It still might be best not to do so all things considered.
unlike for other humans, we don't have an instrumental reason to include them in the programmed value calculation, and to precommit to doing so, etc. For animals, it's more of a terminal goal.
First, it seems plausible that, we (in fact) do not have instrumental reason to include all humans. As I argue in section 4.2. There are some humans such as: " children, existing people who've never heard about AI or people with severe physical or cognitive disabilities unable to act on and express their own views on the topic" who, if included, would also only be included in because of our terminal goals, because they too matter.
If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do.
Whether this is implausible or not is a discussion about normative and practical ethics, and (a bit contrary, to what you seem to believe) these kinds of discussions can be had, are had all the time inside and outside academia and are fruitful in many instances.
if that terminal goal is a human value, it's represented in CEV
As I argue in Section 2.2, it is not clear that by implementing CEV, s-risks would be prevented for certain. Rather, there is a non-negligible chance that they are not. If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present. If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics. As a result, you should argue for it presenting arguments to justify certain views in these disciplines.
You don't justify why this is a bad thing over and above human values as represented in CEV.
This seems to be the major point of disagreement. In the paper, when I say s-risks are morally undesirable, i.e. bad, I use bad and morally undesirable as it is commonly used in analytic philosophy, and outside academia, when for example someone, says "Hey, you can't do that, that's wrong".
What exactly I, you or anyone else mean when we utter the words "bad", "wrong", and "morally undesirable" is the main question in the field of meta-ethics. Meta-ethics is very difficult and contrary to what you suggest, I do not reject/disclaim moral realism, neither in the paper nor in my belief system. But I also do not endorse it. I am agnostic regarding this central question in meta-ethics, I suspend my judgment because I believe I have not sufficiently familiarized myself yet with the various arguments in favour or against the various possible positions. See: https://plato.stanford.edu/entries/metaethics/
This paper is not about metaethics, it is about practical ethics, and some normative ethics. It is possible to do both practical ethics and normative ethics while being agnostic or not being correct about metaethics, as is exemplified by the whole academic fields of practical and normative ethics. In the same way that it is possible to attain knowledge about physics, for instance, without having a complete theory of what knowledge is.
If you want, you can try to show that my paper that talks about normative ethics is incorrect based on considerations regarding metaethics but to do so, it would be quite helpful if you were able to present an argument with premises and a conclusion, instead of asking questions.
Thank you for specifically citing passages of the paper in your comment.
Okay, I understand better now.
You ask: "Where does your belief regarding the badness of s-risks come from?"
And you provide 3 possible answers I am (in your view) able to choose between:
- "From what most people value" 2. "From what I personally value but others don't" or 3. "from pure logic that the rest of us would realize if we were smart enough".
However, the first two answers do not seem to be answers to the question. My beliefs about what is or is not morally desirable do not come from "what most people value" or "what I personally value but others don't". In one sense my beliefs about ethics, as everyone's beliefs about ethics, come from various physical causes (personal experiences, conversations I have had with other people, papers I have read) such as in the formation of all other kinds of beliefs. There is another sense in which my beliefs about ethics, seem to me to be justified by reasons/preferences. This second sense, I believe is the one you are interested in discussing. And what is exactly the nature of the reasons or preferences that make me have certain ethical views is what the discipline of meta-ethics is about. To figure out or argue for which is the right position in meta-ethics is outside the scope of this paper, which is why I have not addressed it in the paper. Below I will reply to your other comment and discuss more the meta-ethical issue.
It is not clear to me exactly what "belief regarding suffering" you are talking about, what you mean by "ordinary human values"/"your own personal unique values".
As I argue in Section 2.2., there is (at least) a non-negligible chance that s-risks occur as a result of implementing human-CEV, even if s-risks are very morally undesirable (either in a realist or non-realist sense).
Please read the paper, and if you have any specific points of disagreement cite the passages you would like to discuss. Thank you
Hi simon,
it is not clear to me which of the points of the paper you object to exactly, and I feel some of your worries may already be addressed in the paper.
For instance, you write: "And that's relevant because they are actually existing entities we are working together with on this one planet." First, some sentient non-humans already exist, that is, non-human animals. Second, the fact that we can work or not work with given entities does not seem to be what is relevant in determining whether they should be included in the extrapolation base or not, as I argue in sections 2., 2.1., and 4.2.
For utility-monster-type worries and worries about the possibility that "misaligned" digital minds would take control see section 3.2.
You write: "Well then, anyone can say Y is the all-important thing about anything obviously important to them. A religious person might want an AI to follow the tenets of their religion." Yes, but (as I argue in 2.1 and 2.2) there are strong reasons to include all sentient beings. And (to my knowledge) there are no good reasons to support any religion. As I argue in the paper and has been argued elsewhere, the first values you implement will change the ASI's behaviour in expectation, and as a result, what values to implement first cannot be left to the AI to be figured out. For instance, because we have better reasons to believe that all sentient beings can be positively or negatively affected in morally relevant ways than to believe that only given members of a specific religion matter, it is likely best to include all sentient beings than to include only the members of the religion. See Section 2.