Intuitions on Negative Utilitarianismpost by Isnasene · 2019-03-18T01:11:25.129Z · score: 7 (4 votes) · LW · GW · 5 comments
Summary Some Basic Attributes of Moral Systems Appropriateness of Different Forms of Negative Utilitarianism Simple Taxonomy of Negative Utilitarian Belief Systems Absolute NU and Lexical NU are Obviously Inappropriate Lexical Threshold NU has Unusual Implications Reasonable Weak NU Ethics and Utilitarian Ethics Are Hard to Distinguish Lexical Threshold Negative Utilitarianism Summary Ethical Importance Filtering LTNU by Ord’s Worse-For-Everyone Argument Two Classes of Reasonably Intuitive NU Review of Happiness Apprehensive Negative Utilitarianism Summary Evaluating Issues with Discontinuities Weird for Everyone Arguments and Simplicity Intuitions Potential Issues in Trading Unlikely Torture for High Probability Pleasantness Potential Issues in Trading Low Frequency Torture for High Frequency Pleasantness Issues with Capturing Negative Utilitarian Intuitions Review of Lexically Distinguishing Negative Utilitarianism Summary Evaluating Issues with Discontinuities Weird for Everyone Arguments and Simplicity Intuitions Modifying LDNU to Avoid High Frequency to High Intensity Trades A Believable, Lexically Worse, Type of Suffering Summary Suffering Worse Than Anything A Justification of Arbitrarily Terrible Suffering Based On Simplicity Intuitions Review of Concrete Upper Bounds On Suffering That May Refute Arbitrarily Large Suffering No Justifiable Upper Bounds On Suffering from Observed Behavior Somewhat Justifiable Upper Bound from Physical Constraints Valuing Subjective Evaluations of Experiences Does Not, Overall, Seem More Complicated Than Existing Problems in Conventional Ethical Systems Agent-Based Generalization of Arbitrarily Bad Suffering in Worst Possible Experiences Formalization for Arbitrarily Bad Suffering in Worst Possible Experiences Formalization for Extending Arbitrarily Bad Suffering Beyond Single Worst Possible Experiences Motivations for Arbitrarily Bad Suffering Tend to Suggest Arbitrarily Good Pleasure Practical Consequences Conclusion 5 comments
[X-posted from my personal blog: https://unseenmatters.wordpress.com/2019/01/09/intuitions-on-negative-utilitarianism/ with very slight adjustments]
[Epistemological Status: Mostly confident although I'm not completely happy with my argument in "A Justification of Arbitrarily Terrible Suffering Based On Simplicity Intuitions" even if I feel like it's pointing at something important. In general, I have few if any disagreements with established consensus on different forms of negative utilitarianism and thus feel outside-view confident about the things that I discuss.]
While I agree with both Toby Ord’s characterization of absolute negative utilitarianism (NU) and Lexical NU as unintuitive, and his treatment of weak NU as (when reasonably defined) not markedly different than typical utilitarianism, I find the discounting of lexical threshold NU less obvious. I analyze lexical threshold NU in further detail and find that lexical thresholds based on an individual’s perception of the intensity of their own suffering both capture the intuitions I would want NU to capture while avoiding the feeling of arbitrariness associated with lexical thresholds in general. I extend the ideas in this form of NU beyond suffering and into happiness to produce an unusual form of utilitarianism that still maintains many of the same stances while relying on fewer assumptions. Moreover, I find that whether I accept or reject this utilitarianism depends on solutions in the field of value inference which do not currently exist (but may ultimately need to in the context of artificial intelligence alignment). Based on this I conclude that, while many forms of NU are unacceptable to me, an ethical system with similar behavior to NU that directly taps into my intuitions exists and cannot be rejected using my current knowledge. This motivates me to have a partially suffering-focused lean in my altruistic decision-making.
Some Basic Attributes of Moral Systems
In my opinion, for a moral system to capture basic intuitions about morality, it must at minimum lead to choices that
I chose these based on the general experience I have with everyday decision-making and this list is neither exhaustive or universal (for instance, Tranquilists may object to requirement 2 though I stand by it per my thoughts here). However, I personally consider any moral system that fails these requirements and the experiences behind them to be inappropriate.
Appropriateness of Different Forms of Negative Utilitarianism
Simple Taxonomy of Negative Utilitarian Belief Systems
In Toby Ord’s piece, “Why I’m Not a Negative Utilitarian”, the following taxonomy of negative utilitarian (NU) belief systems is described:
While no explicit definitions of suffering and happiness are given in the article, I somewhat tautologically note here that suffering is associated with all forms of discomfort and happiness is associated with all forms of comfort.
Absolute NU and Lexical NU are Obviously Inappropriate
Ord accurately notes that Absolute NU fails to distinguish good things from even better things, failing 3. on my list. Furthermore, lexical NU, while not technically failing any of the items, implies that no moral trade-offs between good and bad things exist, which is against the spirit of the list. Lexical threshold NU technically gets a pass in some scenarios by allowing some level of trade-off and weak NU experiences no issues with my above requirements.
Lexical Threshold NU has Unusual Implications
Ord notes that
“If you believe in Lexical Threshold NU, i.e. that there are amounts of suffering that cannot be outweighed by any amount of happiness, then you have to believe in a very strange discontinuity in suffering or happiness. You have to believe either that there are two very similar levels of intensity of suffering such that the slightly more intense suffering is infinitely worse, or that there is a number of pinpricks (greater than or equal to one) such that adding another pinprick makes things infinitely worse, or that there is a tiny amount of value such that there is no amount of happiness which could improve the world by that level of value.”
I agree with this and several of these possibilities fail my basic requirements if thought about naively. For instance, I believe that there is no arbitrary limit on how much a being can suffer so, if a finite level of suffering ever becomes infinitely morally bad, then there is an even greater level of suffering also infinitely bad but indistinguishable from the lesser amount of suffering. However, this can be evaded by value lexicality which allows the possibility of multiple types of displeasure which can only be traded-off against other displeasures of the same type. Furthermore, the statement “there is a tiny amount of value such that there is no amount of happiness which could improve the world by that level of value” seems weird but does not break any of my basic requirements. Further discussion is required.
Reasonable Weak NU Ethics and Utilitarian Ethics Are Hard to Distinguish
Weak NU is hard to pin down because specific definitions of suffering and happiness are ambiguous. Nevertheless, Ord notes that if explicit definitions are applied, negative utilitarianism seems superfluous. Consider the following examples:
In short, the key issue is that magnitudes of suffering and happiness are only provided after their morality has been evaluated to an extent. From this, I conclude that someone who seems to subscribe to a reasonable form of weak NU ethics might simply be a utilitarian who evaluates a given negative experience as more intensely negative than others.
For this reason, I am not very interested in distinguishing weak NU philosophies and utilitarian philosophies right now because I find the distinction to be practically useless.
Lexical Threshold Negative Utilitarianism
I find that lexical threshold negative utilitarianism (LTNU) is the only form of NU that seems reasonable to me while simultaneously having very different ethical implications than utilitarianism in important contexts. Filtering NU down into only those forms which avoid Ord’s Worse-For-Everyone Argument, I find two frameworks for NU ethics that I take seriously: happiness apprehensive negative utilitarianism (HANU) and lexically distinguishing negative utilitarianism (LDNU) which respectively imply that an arbitrary amount of happiness only has a finite moral value, and that some kinds suffering can be lexically worse than others.
Lexical threshold NU (LTNU) is the only form of NU that aligns with my most basic intuitions about ethical systems while simultaneously being different enough from conventional utilitarianism to merit consideration relative to it. As mentioned earlier, LTNU essentially corresponds to the statement that “that there are amounts of suffering that cannot be outweighed by any amount of happiness.” This does not, at a glance, seem unreasonable to me and it has some important implications. If we determine the value of existence using expected value calculations for instance, LTNU significantly weakens how compelling very good universes are relative very bad universes and may lead people to conclude that non-existence is favorable. Since this is a significant conclusion, understanding the origination/validity of LTNU ethics is critical.
Filtering LTNU by Ord’s Worse-For-Everyone Argument
Ord’s main criticism of NU, which applies to every type of it, is the Worse-For-Everyone argument. This argument amounts to showing that NU in general encourages decisions that are worse-for-everyone because they often fail to capture the ways that people make trade-offs between good things and bad things. I agree with this latter argument and it is essentially the same basis that I have for treating reasonable weak NU ethics and utilitarian ethics as the same. Thus, I am only considering LTNU models that reduce to conventional utilitarian ethics in the low intensity region. Ord recognizes that these models avoid the Worse-For-Everyone Argument stating
“Lexical Threshold NU could avoid this in cases below its threshold if it said that happiness and suffering were equally important below that level, but this would create a particularly odd kind of discontinuity and wouldn’t seem to satisfy their intuitions either.”
Since Ord and I both only take ethical systems seriously if they avoid the worse-for-everyone problem, further consideration of it is not necessary. Furthermore, unlike convention utilitarianism, LTNU does capture an intuition I have that a day in hell cannot be outweighed.
Two Classes of Reasonably Intuitive NU
Ord’s second criticism, which applies specifically to LTNU, is that they produce a “very strange discontinuity in suffering or happiness.” The details and implications of these discontinuities change depending on the specific type of LTNU under consideration so there is no general counterargument. Fortunately, the three basic attributes of moral systems I care about along with filters by the worse-for-everyone argument produce two classes of LTNU that still merit serious consideration. In short, reasonable LTNU can be divided into two classes: those that accept value-lexicality and those that do not and each may have different implications about discontinuity.
If I ignore value-lexicality and demand my three basic requirements for moral systems, I must ignore both forms of LTNU that allow suffering to yield infinite moral disvalue and forms that do not distinguish between different levels of happiness. This leaves me with figure 1a which satisfies all my intuitions without value-lexicality. After all, more suffering is always lower in value than less suffering and more happiness is always higher in value for the world than less happiness. Since this form of NU does not technically involve lexicality but instead involves some moral apprehension about increasing happiness, I refer to it as happiness apprehensive negative utilitarianism. Arguably, this was never a form of LTNU but I have categorized it as such based on Ord’s convention.
Figure 1. Illustration of reasonable LTNU frameworks without value-lexicality (a) and with value-lexicality between A-suffering and B-suffering (b). The happiness and suffering axes correspond to overall happiness and overall suffering in the world. Note that value-lexicality is not necessarily limited to a single discontinuity as shown in (b) and is not necessarily limited to the suffering side. The plot is merely intended to demonstrate the concept behind lexicality (some suffering being infinitely worse than other suffering while still being weighed against suffering of similar quality) to begin with.
If I accept value-lexicality, then the apparently infinite discontinuities on the moral value axis can still satisfy my three basic requirements since sufferings can be separated out into distinct classes of badness which can only be traded off against other sufferings in classes of equivalent or worse badness. An example of this is shown in figure 1b which shows that A-Suffering may be traded-off against happiness, but B-Suffering is always infinitely worse than any amount of happiness. Since this form of NU makes distinctions between qualitatively distinct categories of suffering take-over, I will just refer to this type of NU as lexically distinguishing negative utilitarianism (LDNU). Technically speaking, LTNU would be just as good an acronym since this model indeed involves lexical thresholds but I avoid this in order to avoid confusion with Ord’s convention.
Review of Happiness Apprehensive Negative Utilitarianism
After reviewing HANU in greater detail, I find that, while my basic intuitions are satisfied and problems with discontinuities are dissolved, HANU has issues with ill-defined complexity, a tendency to reduce itself to lexical negative utilitarianism (which I reject) under reasonable assumptions about the universe, and a failure to capture negative utilitarian intuitions. These issues ultimately lead me to suspect that a bigger problem is at play.
Evaluating Issues with Discontinuities
When Ord criticizes discontinuities in LTNU, he accurately enumerates several types of discontinuities. HANU specifically is vulnerable to the observation that that “there is a tiny amount of value such that there is no amount of happiness which could improve the world by that level of value.” I can phrase this statement in two ways. In order to accept HANU, my intuition must accept the ideas incorporated in both these phrases.
Phrasing 1 is not problematic for me because, if there are amounts of suffering that can never be outweighed (which feels intuitive to me), then the idea that the amount of required happiness to outweigh suffering explodes to arbitrarily large numbers right before that threshold seems natural. In short, it is similar enough to the original intuition that it does not bother me.
In contrast, phrasing 2 feels more problematic to me. This is expected because the phrasing is a specific consequence of the HANU model in figure 1 rather than a universal aspect of all LTNU models. In short, should doubling the amount of happiness really have only a marginal moral value? On one hand, a twice-as-happy universe seems like it should generally be twice as good. On the other hand, if the universe is already having an amazing time and someone offers to make that time twice as amazing, my intuition sees this development as relatively unexciting from a self-interested perspective and even less exciting from moral perspective. This applies regardless of whether the awesomeness is representing people’s individual degrees of happiness or the number of people who are happy.
In any case, HANU does not violate any of my basic intuitions or seriously bother me with discontinuities. For these reasons, I share Simon Knutsson’s perspective that Ord’s arguments against NU fully persuasive due to a failure to consider HANU. Nevertheless, Ord’s arguments are not the only arguments that exist against this form of NU.
Weird for Everyone Arguments and Simplicity Intuitions
Even if HANU captures my basic intuitions, it is far more complex as an ethical model than classical utilitarianism. Classical utilitarianism is essentially a straight 1:1 line with all complexity built into how happiness and suffering are defined. In contrast, HANU can be instantiated with any arbitrary function of happiness that starts out as a 1:1 line and asymptotically approaches some unclearly defined value as it increases. When should the function deviate from a straight line and what intuition would produce it? The tempting answer is that it could be some average of what everyone wants it to be, but most people have not seriously considered HANU to begin with. Furthermore, because of disagreements about this and personal variance, HANU may be weird for everyone even if it is not necessarily worse. This can technically be avoided if HANU has an exact straight line until it begins to gradually deviate at some atypically large happiness level but this value must also be specified. Even more fundamentally, what should the happiness threshold be? Does it level off when ten people are a little happy, when a thousand people are extremely happy, or some other number?As a moral realist who has seen complexity reduction as a motivation for expanding my moral circle, I see complexity issues like these as actual issues and, unless I or someone else comes up with a principled way to resolve them in the context of HANU, will treat these questions as a problem for that ethical system.
Potential Issues in Trading Unlikely Torture for High Probability Pleasantness
The intuition that there are experiences no amount of happiness can make up for is not so different from a similar intuition that the possibility of large amounts of suffering are always unacceptable no matter what. If this latter intuition is one’s motivator for considering negative utilitarianism over utilitarianism, then HANU is a poor choice. Since HANU disallows both infinite moral disvalue due to suffering while allowing happiness to have finite positive moral value, one is encouraged to accept a sufficiently small risk of incomprehensibly horrible suffering in exchange for the sufficiently large possibility of a moderate amount of happiness.
If this is indeed the sentiment, one might conclude that HANU is insufficiently negative-utilitarian and resorts to lexical negative utilitarianism. However, as discussed earlier, I personally do not consider lexical negative utilitarianism reasonable and am willing to accept extremely low probabilities of mild unpleasantness for high probabilities of pleasure.
Potential Issues in Trading Low Frequency Torture for High Frequency Pleasantness
Given that HANU allows probabilistic trade-offs, it is worth noting that hypothetical probabilities can often be re-interpreted as frequencies by switching between Bayesian and frequentist interpretations of probabilities. For instance, given the many worlds theory of quantum mechanics and the assumption that our uncertainties reflect quantum measures on the average, accepting a one-in-a-thousand chance of incredible suffering to acquire some pleasure would literally mean allowing a person in one of every thousand universes with conditions indistinguishable from the experienced universe to experience incredible suffering. In other words, the probabilistic trade-off becomes a situation where an unacceptable amount of suffering is truly traded off.
One can attempt to remedy this by applying HANU to the amounts of suffering and happiness in every universe in the multiverse instead of the amounts in a single world. However, if arbitrarily small probabilities or (equivalently) an arbitrarily large number of universes are allowed, then an arbitrarily large amount of happiness is present in the multiverse. When happiness is at arbitrarily large levels, the moral benefit of increasing happiness is arbitrarily small for an arbitrarily large number of people. This, again, causes HANU to become lexical negative utilitarianism in the same way that preventing probabilistic trade-offs does.
Since I often accept small amounts of discomfort to gain small amounts of comfort, I still reject lexical negative utilitarianism though admittedly imagining the probabilistic suffering as literally happening has raised my intuition’s amenability towards it. However, because I would like my ethical philosophy to be robust to becoming lexical negative utilitarianism in the context of plausible metaphysical models, I consider this to be a serious criticism.
Issues with Capturing Negative Utilitarian Intuitions
HANU functions at a negative utilitarian ethical system by considering a lot of happiness not treating that happiness as much worse than an even larger amount of happiness. This does not reflect the fact that most people come to negative utilitarianism through a recognition of how awful pain is:
“Yet negative utilitarianism doesn’t derive from self-hatred or some nihilistic death-wish. It stems instead from a deep sense of compassion at the sheer scale and intensity of suffering in the world. No amount of happiness or fun enjoyed by some organisms can notionally justify the indescribable horrors of Auschwitz. Nor can it outweigh the sporadic frightfulness of pain and despair that occurs every second of every day.”—David Pearce
While one might argue that the way pain gets very bad very fast relative to happiness in the HANU model does indeed capture this feeling, this argument is not fully convincing to me. I think that, if an ethical system is inspired to capture an intuition about something, then that system should treat that something differently instead of just modifying the things around it.
Furthermore, while negative utilitarian ethicists usually discuss very awful pains, nothing in the HANU system prevents the world from exchanging many small discomforts for a large one. Is exchanging torture for a massive amount of mild pleasure that much worse than exchanging torture for the banishment of a massive amount of mild displeasure? My intuition does not care but HANU, along with strong negative utilitarianism and lexical negative utilitarianism do.
Review of Lexically Distinguishing Negative Utilitarianism
I review LDNUs and note that serious issues with discontinuities and complexities that might be justified by a believably lexically worse category of suffering. I resolve many of LDNU conflicts with my more complicated intuitions about negative utilitarianism by introducing intensity focused LDNUs (IF-LDNUs) which describe moral value/disvalue contributions of people to the world rather than just the overall moral value/disvalue of the world itself. Based on this, I believe that IF-LDNU frameworks may be the only forms of negative utilitarianism that I might find personally compelling. However, to compel me, these too require a believably lexically worse category of suffering.
Evaluating Issues with Discontinuities
While HANU does not experience any serious discontinuities, most LDNU frameworks do. For instance, in figure 1b, some suffering threshold must be reached to yield a transition between A-Suffering and B-Suffering. This corresponds with Ord’s criticism that “there are two very similar levels of intensity of suffering such that the slightly more intense suffering is infinitely worse, or that there is a number of pinpricks (greater than or equal to one) such that adding another pinprick makes things infinitely worse.” Like Ord, I agree that this is a very unintuitive implication of an ethical system. When I think of unpleasantness, I cannot imagine that two levels of suffering, only marginally different in intensity, also have arbitrary differences in moral disvalue. David Pearce, a negative utilitarian, is also willing to admit that this issue is hard to get around.
Simon Knutsson attempts to defend against this criticism by noting that most lexical classifications are not sharp distinctions:
“One can, for example, hold that there are are different orders or levels of suffering and that the separation between them is not sharp in the way Ord describes it. There are different versions of such non-sharp distinctions, but here is just a simple example to illustrate. Say that someone claims that there are bald and non-bald humans, must that person hold that there is some sharp line such that if one adds a single hair, a person would go from bald to non-bald?”—Simon Knutsson
However, while this argument might work in the general context of human categorization alone, it fails to avert actual discontinuities in decision-making. For instance, if A-Suffering and B-suffering are two lexically different levels of suffering with B-Suffering being arbitrarily worse than A-suffering, then a state of almost all A-Suffering and an infinitesimal amount of B-Suffering is infinitely worse than a state of all A-Suffering. Consequently, if it is ever possible for a small amount of added suffering to shift one’s categorization of overall suffering from completely A-suffering to maybe-slightly-almost B-suffering, the moral disvalue of the suffering goes to infinity. This yields a discontinuity as shown by the solid line in figure 2b.
Figure 2. Depictions of the fuzzy categorizations of B-Suffering on a scale from certainly no B-Suffering to certainly B-Suffering as suffering and happiness are varied (a) and how those categorizations inform moral value and disvalue (b). The solid “always non-negative” line depicts a situation where one confidently identifies sufferings below a threshold as having no B-Suffering while the dashed “always above zero” line depicts a situation where one is never fully certain of the absence of B-Suffering even in happy states. Note that moral disvalue due to A-Suffering or moral value due to happiness are not visible on the scale next to the arbitrarily massive disvalues from B-Suffering.
One can avoid this discontinuity by treating the degree of B-Suffering as always non-zero even in the presence of happiness. On one hand, this seems strange because figure 1 and 2 both depict suffering and happiness on different sides of the same axis so, if B-suffering exists simultaneously with happiness, it differs fundamentally from typical suffering. On the other hand, treating suffering and happiness as represented by a single variable itself is not necessary. Moreover, if the amounts of happiness and suffering in figure 2 correspond to amounts in the world rather than within a person (as they do in figure 1), the single-axis presentation is obviously incorrect: If one person is happy and one person is suffering, both exist simultaneously. Thus, I am open to this recontextualization.
If this is all accepted, it has interesting implications. Because the moral disvalue of B-Suffering is essentially infinitely greater than moral disvalue from A-Suffering or other sources of moral value, changing the degree of B-Suffering in the world is now infinitely more important than any other concern. More strangely, because every level of suffering and happiness has some associated nonzero degree of B-Suffering, trade-offs between any amounts of A-Suffering and happiness are permitted so long as the degree of B-Suffering is minimized. In principle, I see no counterintuitive issues with discontinuities and think that this approach might be reasonable if B-Suffering is really that bad.
Note that, in either case, even if the categorization of B-Suffering ever falls exactly to zero (which is likely in a discrete physical world), these sorts of infinite discontinuities may be justified in the context of how B-Suffering is defined and excused as artifacts of reality rather than of the ethical system. However, B-Suffering is not defined so I cannot comment conclusively.
Technically, a third method also exists wherein a continuum of lexically distinct forms of suffering are assigned to every distinct level of suffering so that the nth level of suffering exclusively corresponds to the nth type of suffering. However, this naturally implies a lexical threshold exactly at the first distinct form of suffering and thus is an even more limiting ethical system than lexical negative utilitarianism. Since I reject lexical negative utilitarianism, no further discussion of this method is warranted.
Weird for Everyone Arguments and Simplicity Intuitions
If the discontinuous variant of LDNU is used, the weird for everyone arguments that may face HANU are completely avoided by virtue of LDNU behaving just like utilitarianism prior to some high level of suffering. If the continuous variant is used, weird-for-everyone arguments manifest depending on the degree to which B-Suffering appears at low intensity suffering and happiness.
These same issues manifest from a simplicity perspective. Discontinuous LDNU requires the seemingly arbitrary specification of a threshold value at which B-Suffering occurs and continuous LDNU requires a defined relationship between the degree of B-Suffering and the amounts of happiness and suffering. The legitimacy of both depends on how B-Suffering is defined and justified.
Modifying LDNU to Avoid High Frequency to High Intensity Trades
I gave HANU significant criticism for its willingness to make probabilistic and potentially literal trades between intense suffering and frequent happiness. This reflects the fact that the HANU happiness threshold must correspond to the entire world’s happiness and consequently that some amount of the world’s suffering will overpower any amount of its happiness. However, none of these parameters care about the kind or intensity of suffering—only the quantification of the amount. Thus, in my opinion, a negative utilitarian framework that accurately captures negative utilitarian intuitions must also capture the intensity of the suffering itself. This means that it must apply on a per-person level rather than a world level because intensity corresponds’ to an individuals’ experience. HANU can never do this because it must always treat changes in people’s experience as having different moral value depending the number of people in the population (i.e. making two people twice as happy must be less than twice as good as making one person happy).
Fortunately, LDNU can. While bounded moral value on a per-person basis can never yield bounded total moral value in a world with unbounded people, unbounded moral value on a per-person basis typically does yield unbounded moral value in such a world. To create an intensity focused LDNU (IF-LDNU), simply follow these steps
This process can be performed on figure 1b and figure 2b to illuminate how the corresponding IF-LDNU expects suffering’s intensity to impact its lexicality. Overall, IF-LDNUs are much better at capturing one’s intuitions about how bad suffering can feel compared to typical LDNUs or HANU while also better describing people as equally important.
A Believable, Lexically Worse, Type of Suffering
Based on my observation that IF-LDNUs seem like the only form of negative utilitarianism that ultimately capture my intuitions, I explore the concept of suffering without a defined bound on how intense it feels. Intuitive and somewhat formal arguments are provided for why one might think such a kind of suffering exists. Moreover, on reviewing it in the context of utilitarianism and preference utilitarianism, I find that its actual existence could be shown or dissolved depending on one’s method of value inference—which is a thorny and potentially impossible problem in the context of artificial intelligence alignment. I also note how arguments for arbitrarily bad suffering also imply arguments for arbitrarily good happiness and that incorporating both produces a form of utilitarianism that may be more justifiable than pure negative utilitarianism. Finally, I review practical implications of arbitrarily good and arbitrarily bad experiences and make two observations: arbitrarily bad suffering generally implies that existence is undesirable unless arbitrarily good suffering is also included; and both ethical systems suggest that altruists may want to focus on addressing rare but extremely intense negative experiences over prominent but tolerable negative experiences. Though I am uncertain about whether these ethical systems are the ones I should believe due to my corresponding uncertainty about value inference processes, I am leaning towards giving at least some weight to this belief system.
Suffering Worse Than Anything
The central idea behind lexical characterizations of negative utilitarianism is that some bad experiences can be seen, from a moral perspective, as infinitely worse than other bad experiences. By symmetry, my central intuition for negative utilitarianism is that experiences of intense enough suffering really feel like they are worse than everything. When I imagine myself in such situations, I can envision a level of suffering so intense that I might find myself thinking “I would do anything for this to stop”—a statement literally suggesting that nothing, including other sorts of pains and pleasures, could outweigh a specific experience in any amounts. Considering this statement and genuinely believing it is most of what one needs to be a negative utilitarian.
Whether one should or should not believe this statement, in my opinion, is ultimately the crux of the difference between utilitarian and negative utilitarian ethical systems. On one hand, it is tempting to treat statements like “I would do anything for this to stop” as reflective of a cognitive failure to evaluate the situation fully. In particular, cognitive biases like scope neglect [LW · GW] may be at fault. Furthermore, torture produces significant physiological changes in the brain, including impacts on cognitive processes so statements similar to the one above might be regarded as illogical and foolish.
On the other hand, the badness of suffering itself is a fundamentally subjective notion because interpersonal comparison of experiences is infeasible [LW · GW] without a priori assumptions. Thus, a person’s evaluation of suffering is a subjective review of something that itself must be treated as subjective. Consequently, the line between the evaluation “I would do anything for this to stop” and the experience corresponding to the statement is blurred in a way that the statement itself may be a fundamental part of the experience even if cognitive biases contributed to it. Furthermore, in the same way that utilitarian ethicists might consider the cognitive impacts of torture as invalidating subjective evaluations of it, a negative utilitarian might see those same cognitive effects as fundamentally critical to the nature and experience of the torture itself. In some sense, this is obviously true. The ethical implications of conditions like pain asymbolia which preserve the fundamentals of pain while changing how people care about it on a higher level are still widely discussed with one possible conclusion being that the degree a person cares about pain experiences directly influences whether those experiences constitute suffering. Considering this, the cognitive effects of torture are just one reason that it could be considered fundamentally more awful than regular suffering; the effects themselves change one’s way of thinking about suffering in a way that arbitrarily magnifies how bad the suffering is relative to a more rational state.
These two views reflect two different ways of looking at suffering. The utilitarian takes an outside view of suffering and notes that, for the most part statements like “I would do anything for this to stop” do not make sense since there is generally always a way to increase the level of harm. In contrast, the negative utilitarian takes an inside view and notes that, because suffering is subjective, someone’s feeling about how bad suffering feels is not just an indicator of the degree of suffering but an actual component of the suffering itself. While epistemological uncertainty about this is reasonable, I, speaking from the inside view as a human being, favor the inside view to an extent. In short, I believe that a compelling case can be made that this sort of suffering is in fact lexically worse than other forms of suffering.
A Justification of Arbitrarily Terrible Suffering Based On Simplicity Intuitions
Even for ethicists that generally prefer to take outside views on suffering, certain reasonable assumptions imply that responses to empirical observations of suffering should treat some suffering as lexically worse than other suffering. These assumptions are as follows:
To begin, let A be a being. Because A is a finite physical being, there are only n distinct experiences it can have in total. Consider the case where A is experiencing experience X, and claims “this is the worst thing I can experience.” From 3. there is a reasonable chance that Atruly means this so X is objectively (by 1.) worse than n-1 other experiences which we can speculate about based on knowledge of A.
Now let A’ be a hypothetical being. Because A’ is a finite physical being (albeit a more complicated one), there are m distinct experiences it can have in total. Consider the case where A’ experiences X’ and makes the same claim as A. From 3. there is a reasonable chance that X’ is objectively (by 2.) worse than m-1 other experiences.
If A’ is envisioned to be arbitrarily complex so m grows arbitrarily large and A’ does not store very similar experiences because they waste memory, 1. implies that X’ may be an arbitrarily horrible experience. However, an observer comes along seeking to decide whether A is experiencing something like the finitely bad X or the arbitrarily bad X’. If the observer judges solely based on A’s response, this is impossible because A responds just like hypothetical being A’ does when experiencing something much worse. Consequently, statements like “this is the worst thing I can experience” suggest the possibility of arbitrarily bad or lexically worse amounts of suffering relative to those implied by other expressions of discomfort.
However, this argument relies on insufficient assumptions. Because X’ can be arbitrarily worse than X, and the two are indistinguishable, one also must conclude that A’ experiencing X’ is indistinguishable from an arbitrarily better experience (assuming happiness is, like suffering, unbounded). Thus, when someone says that “this is the worst thing I can experience,” that experience can be taken as arbitrarily good or arbitrarily bad. In absence of an interpersonal way of measuring suffering, no value judgements can be made. A fifth assumption must also be made:
To capture reasonable intuitions, I also assume
In practice, one may broaden 6. with the modified claim that beings need not be sufficiently similar for this to hold true and, in cases where two beings seem to have different subjective expectations of the same experience, the beings are having fundamentally different experiences due to their subjective interpretations. This claim that the subjective interpretation of an experience has implications for the experience itself echoes the claim that “I would do anything for this to stop” necessarily implies suffering from suffering associated with a different lexicality.
However, in any case, 6. implies that X must always be about as bad as X’ because the corresponding subjective experiences are identical and therefore X itself must be treated as arbitrarily bad. In short, if the worst thing any being experiences is just as bad as the worst thing any other being can experience, then the badness of this experience is an upper bound on suffering. By 1., this means that the badness of an experience like this must be infinite. Practically, this means that people can experience things that are infinitely terrible.
Of course, this conclusion can be escaped If one rejects 1. and assumes that people (or beings in general if 6. is broadened) can only experience finitely intense suffering. However, because suffering and happiness are effectively unbounded unless some explicit upper bound is placed on them, some suffering threshold (even if it is very large) must be asserted to reject this conclusion. Some people have done this, conceding that subjecting someone to fifty years of uninterrupted and merciless torture is less bad than subjecting 3^^^3 people to a single barely unpleasant dust speck in the eye each [LW · GW] and justifying people’s unwillingness to make the trade as a result of scope insensitivity—intuition’s failure at large numbers. To be fair, this is an extreme case. One can also impose finite suffering by admitting a willingness to trade a large amount something incredibly unpleasant like bad abuse for something that may physically seem only marginally different like sadistic torture worse than anything else a person can think of.
Regardless, when I try to ask myself what my torture trade-off number is, my inside view gives no answer. Comparing incomprehensible intensities of suffering with incomprehensible quantities of it does not seem to help this process either. Whether caused by scope insensitivity or broader issues with the finitude of my mind, this uncertainty leads me to suspect that I should conservatively treat the maximum intensity of suffering as an arbitrarily large but ill-defined number.
Overall, I think this line of reasoning raises my credence in the possibility of infinite suffering and, in turn, the possibility of multiple, lexically distinct, forms of suffering. As with LDNU and IF-LDNU frameworks, this does imply a discontinuity where, at some point, a small physical change will yield an infinite change in the amount of suffering. However, the above justification gives some intuition for why a discontinuity like that would really exist. Even if the physical change producing it is minor, the shift from a just-quite bearable amount of suffering and a fully unbearable amount feels at least a little like it should yield a massive jump in subjective intensity. Furthermore, if this justification even slightly raises one’s credence in infinite suffering above zero, then it cannot consistently coexist with a credence for utilitarianism because the lexically infinite suffering would dwarf the always finite values in utilitarianism. Of course, I need not be consistent and do not find this detail particularly compelling.
Review of Concrete Upper Bounds On Suffering That May Refute Arbitrarily Large Suffering
The above argument boils down to the observation that, unlike negative utilitarianism, conventional utilitarianism relies on the existence of a defined upper bound on how bad the worst thing people can feel is. Consequently, many of the strongest arguments against negative utilitarianism are those that produce an upper-bound. Two of these arguments—the production of upper bounds through observed behavior and the production of upper bounds through physical limitation—are introduced and discussed below. I find upper bounds produced by the former method flawed in the context of existing human behavior and upper bounds produced by the latter method reasonable but insufficiently compelling for me to buy into them completely.
No Justifiable Upper Bounds On Suffering from Observed Behavior
One method for attempting to refute arbitrarily great suffering is by proposing an upper bound based on people’s willingness to risk incomprehensible suffering. For example, people get into car accidents every day and a small percent of these people may sustain massive injury and pain before dying—indicating an elevated risk of incomprehensible suffering. The life-time odds of dying in a car accident are about one in six hundred so, if one considers the worst 0.01th percentile of car accident deaths as incomprehensibly unpleasant, then being near cars yields a one in sixty-million chance of something infinitely horrible happening. One therefore, roughly speaking, concludes that the maximum badness of suffering is no greater than the sixty thousand lifetimes of enjoying the convenience of car usage is good. Naturally, this quick estimate has several methodological flaws but even with these, one can be reasonably confident that the correct upper bound is no greater than twenty orders of magnitude above that calculation. An upper bound on the upper bound of how bad suffering intensity gets is therefore confidently below the good associated with the convenience of using a car for 1027 life-times. Since humanity has defined an upper bound here, arbitrarily great suffering does not seem practically justified.
However, while this upper bound may seem sensible, it hides the also empirical nature of negative utilitarian intuitions. While person choosing to risk incomprehensible suffering demonstrates the existence of a trade-off point empirically, the person feeling incomprehensible suffering willing to bargain anything to make it stop demonstrates the non-existence of a trade-off point empirically as well. Choosing to focus on the former rather than the latter in empirically deciding trade-off points reflects a deliberate bias against the actual feelings this form of negative utilitarianism is based on. Furthermore, most people have never experienced incomprehensible suffering so favoring their opinions over people experiencing it seems likely to lead to inaccuracies. Thus, the upper bound is not empirically sound.
Furthermore, people’s choices do not necessarily capture all the intuitions that they have. Instead of imagining the torture as having a one-in-sixty-million chance of happening, imagine a single immortal being who can choose between being able to use cars for reasons unrelated to incomprehensible suffering prevention for the next 1027 life-times or immediately experiencing incomprehensible suffering. What choice should be made? On one hand, the math above suggests that the incomprehensible suffering is the better option. On the other, living without a car is still overall a good experience and the future ability to drive one seems like it will be little solace while the being suffers incomprehensibly. The answer does not seem obvious and, when people make empirical decisions, this is rarely the sort of question that they pay attention to. Thus, the upper bound does not accurately capture what people would seriously choose if they thought about it.
Between these two issues of biasing samples against people suffering incomprehensibly and measuring answers to irrelevant questions, empirical upper bounds on the intensity of suffering are both methodologically flawed and fail to refute the intuitions which lead people to consider that pain can become arbitrarily bad. Moreover, if these issues were rectified, the empirical upper bound would tend to infinity: People subjected to incomprehensible suffering would declare it as arbitrarily bad and people trying to answer the right questions would uncertain about their answers. Thus, empirical observations do not constitute a satisfactory argument against this form of negative utilitarianism.
Somewhat Justifiable Upper Bound from Physical Constraints
A stronger and more fundamental method for proposing upper bounds relies on the physical limitations of beings capable of experiencing suffering. For example, the human brain typically has 1011 neurons and, if each can be represented by a thirty two bit number, this corresponds to 1012 bits which, if flippable in any fashion, implies around 21000000000000000states. Thus, the brain only distinguishes, at most, around 21000000000000000 experiences. Given the assumptions that suffering is relative to the states themselves and that no lower level principle allows two pairs of adjacent states to have different differences in well-being, one can conclude that the worst state can only be 21000000000000000 times worse than the best state. This implies an upper bound.
Unlike predicting upper bounds from empirical behavior, this method relies on more fundamental assumptions about how suffering might emerge in the mind and therefore has more strength. However, from another perspective, the specific assumptions associated with the idea that subjective experiences like suffering are regulated by physical differences between states rather than subjective differences are not particularly convincing. Moreover, for practical purposes, an upper bound of 21000000000000000 is massive. Unless a lower upper bound can be established, finitely bad suffering on the order of 21000000000000000 still seems like it would eclipse concerns about typical experiences of suffering. Without an even more explicit model about how experiences are generally had, treating the maximum intensity of suffering as potentially arbitrarily large seems like the better strategy.
Valuing Subjective Evaluations of Experiences Does Not, Overall, Seem More Complicated Than Existing Problems in Conventional Ethical Systems
Beyond proposals for creating upper bounds on the intensity of suffering, another more serious criticism of treating worst possible experiences as infinite suffering relates to the introduction of subjective evaluation as relevant to suffering in the context of beings that may not clearly perform them. However, this is relatively unimportant to me as most agent-motivated ethical systems already deal with issues that, when resolved, will necessarily answer the question of whether the complexity is justified. In particular, these systems need additional information on how to distinguish between choices due to irrationality and choices due to preferences even if Occam’s razor is assumed. Thus, despite negative utilitarianism including an individual’s subjective evaluation of how bad their experience is in the generally agreed-upon view of how bad it is adding complexity, conventional forms of utilitarianism already have that complexity in their requirement for some evaluation of how one ought to distinguish evaluations of bad experiences from unwanted limitations on actions or cognitive biases. While valuing a subjective evaluation of an experience’s badness is different from instrumentally using an unspecified method of inferring values from irrational behavior to identify an experience’s badness, defining the latter method completely determines whether the former process is justified.
Furthermore, certain proposals for that method imply arbitrarily bad suffering. Approval upon reflection explicitly relies on an agent’s subjective evaluation in a way that reinforces arbitrarily bad suffering so long as the agent treats intense torture as something that cannot be outweighed by small pleasures/relief from small pains after extensive reflection. This is often true both in the context of the inside-view perceptions I discuss earlier and in things people have stated. Another method, using regret [LW · GW] implies arbitrary suffering even more strongly; it is hard to imagine a person who chooses torture not immediately wishing to undo it given the opportunity–even if finishing the torture provides a great reward. Thus, even if considering self-evaluation of one’s subjective experience as part of the value of an experience presents complexity, this complexity is no worse than existing problems in conventional ethical systems.
Agent-Based Generalization of Arbitrarily Bad Suffering in Worst Possible Experiences
To further expound upon negative utilitarianism involving arbitrarily bad suffering, one can introduce semi-formal models of how such an ethical system works in the context of agents with the acknowledgement that shifting from agent models to physical world models poses challenges for conventional ethical systems in general. Two main formalizations emerge here. The first, motivated solely by earlier arguments about a being’s worst possible experience, discusses the defining of a being’s worst possible experience as arbitrarily bad. The second, motivated by more general intuitions about some kinds of suffering feeling lexically worse than other kinds, allows for many different but unpleasant experiences to all be treated as lexically bad.
Formalization for Arbitrarily Bad Suffering in Worst Possible Experiences
If satisfied and dissatisfied preferences are a good generalized model of suffering, then one agential strategy that points toward the a negative utilitarian morality acknowledging infinite bad suffering is simple: when inducing the utility function of a given agent, treat a given situation as having arbitrarily bad disutility if either the agent is never predicted to act for the sake of arriving at that state or if the agent is predicted to act for the sake of exiting that state into any other circumstance. These two qualifiers for arbitrarily bad suffering reflect the preferences of the agent prior to the experience and the preferences during the experience. If an experience involves arbitrarily great suffering, an agent in either of those situations should behave as if it really does.
Formalization for Extending Arbitrarily Bad Suffering Beyond Single Worst Possible Experiences
Though the model above captures many of the negative utilitarian intuitions I care about, it does not fully capture human-level concepts about what the intensity of suffering corresponds to. While the above discussion centers around a single experience worse than any other in the mind of the experiencer, people can consider multiple distinct experiences to be unbearable and often even rank which unbearable experiences are worse than others. For instance, the experience of having one’s right leg flayed of skin may be considered as unbearable torture unjustifiable by any amount of relief from small pains or gains from small pleasures but is still probably better than the experience of having both of one’s legs flayed of skin simultaneously. While both experiences involve the flaying of skin, the latter clearly has more of it despite none of the flaying itself constituting a more intuitively intense form of torture. This suggests that defining intensity of suffering as how bad a given experience is lacks accuracy: even within a single person having a single experience, the amount of pain caused from injuries is related both to the amount of injuries (up to the extent when one’s awareness of injury is saturated) and the intensity of those injuries. The latter form of intensity, rather than general rankings of an experience’s badness, seems to be the kind that pertains most significantly to negative utilitarian intuitions. Two methods exist to capture this re-definition of intensity from an evaluation of how bad an experience is to something more complicated:
In either case, some formal generalization exists for how to treat the moral value and disvalue of experiences in the context of agents.
Motivations for Arbitrarily Bad Suffering Tend to Suggest Arbitrarily Good Pleasure
Whether one defends arbitrarily bad suffering on the grounds that people’s evaluation of their experiences relates directly to the experiences themselves or on the grounds that no clear limit exists on how bad the worst experience a being can have is, both defenses symmetrically apply to happiness. This indicates the possibility of arbitrarily good happiness. After all, just as a being can think “this is worse than anything else that could happen to me,” they can also think “this is better than anything else that could happen to me.” In the absence of other arguments for privileging arbitrarily bad suffering over arbitrarily good happiness, one therefore must accept the significance of both. This yields a form of non-negative utilitarianism which contains many intuitions associated with negative utilitarianism itself while avoiding a bias towards suffering itself.
Nevertheless, this utilitarianism does not seem as compelling to me as negative utilitarianism biased towards suffering. I think this is the case because I see myself as cognitively closer to someone experiencing arbitrarily bad suffering than someone experiencing arbitrarily good happiness. From a practical perspective, I can envision specific situations I could be in that would lead to feeling what seems to be arbitrarily bad pain while I can envision no such situations leading to arbitrarily good suffering. More generally, by virtue of humans being more likely to experience extremely bad things than equivalently extremely good things, I would expect us as a species to be biased against the valuation of arbitrarily good happiness, which may only benefit beings very different than ourselves. Given both that anthropocentric biases explain devaluing arbitrarily good suffering, and that I prefer ethical systems that are not species-specific, I give more credence to general forms of utilitarianism that allow for arbitrarily intense positive or negative experiences than more limited forms of negative utilitarianism. As a side-note, the symmetry justifying arbitrarily good happiness also allows the earlier agential formalizations of arbitrary suffering to be trivially extended to include arbitrary happiness.
As soon as experiences arbitrarily better or arbitrarily worse than others are introduced into an ethical system, the possibility of those experiences immediately dwarfs any infinitely smaller concerns—leading to a system of decision-making very unlike the one that people use. This has significant implications as existing generally increases the probability of having any given kind of experience. Consequently, selfish negative utilitarians only considering the possibility of arbitrarily bad/lexically worse suffering should prefer painless death over their continued existences. If they also consider arbitrarily good happiness, life may have value but, for suffering and happiness agnostic beings, only if that kind of happiness is more likely than arbitrarily bad suffering. In reality though, the opposite is typically true–especially for humans. As a side-note, people who are not selfish may still be motivated to exist purely because their actions may reduce overall suffering even if it raises the likelihood of personal suffering. Still, the collective implication is that all beings should cease to exist if possible.
Somewhat fortunately, selfless people may justify human existence if humanity exists for a long enough time. For instance, if technological advancements allow the creation of beings that exist without the capacity for arbitrarily intense suffering or with a much higher capacity for experiencing arbitrarily good happiness, then existing has long-term value. Unfortunately, because the state of the universe minimizing the possibility of arbitrarily intense suffering likely does not have beings, the former achievement is intractable. This leaves the latter achievement which feels intuitively problematic due to the issue that I, and people in general, cannot conceive of arbitrarily good happiness.
Beyond these questions about whether one should exist, which I and most humans will generally choose to ignore, these ethical systems also make more practical suggestions. People are much more likely to experience very intense amounts of suffering than they are to experience similarly intense amounts of happiness and this means altruists may want to focus on rarer but potentially arbitrarily intense instances of suffering over prominent but bad and tolerable insftances of suffering. With the formalizations I introduced earlier, the details of this vary depending on whether one focuses on a single worst-possible-experience or many very-negatively-perceived-experiences. Either way, both seem likely to worry about wild and factory farmed animals subject to extreme physiological agony without any pain-related interventions.
Throughout my review of NU ideologies, I have found strong agreement with almost every argument made by everyone I have read. I agree almost completely with all of Toby Ord’s arguments about why many forms of negative utilitarianism fail to capture intuitions and only depart with him in that some ethical discontinuities like the change from tolerable suffering to arbitrarily bad suffering feel intuitive to me. Moreover, the concept of extremely intense suffering as a motivator for NU has also been explicitly discussed on the context of some suffering feeling infinite (i.e. arbitrarily bad). The fact that I share so much agreement with others gives me confidence that I am in fact talking about the right things here.
Still, even if there is not much new information here, I hope that this write-up has disambiguated a lot of the discussion around negative utilitarianism and expounded on ideas that may not have all been in the same place. I also hope that my discussion of arbitrarily intense experiences has helped shed light on why some NU-like forms of ethics may both have something to them and raise harder questions than they first appear to. I think understanding NU’s relationship with value inference is very important. After all, the issue motivating me to consider negative utilitarianism—how to decide how much to care about a version of myself in an incomprehensibly painful situation with a vastly different mental state—is inherently an issue in value inference, an extremely challenging field.
Considering all this, I am not sure what my ethical framework ought to be. Overall, I think that between my reasons to consider NU-like ethical systems and my human-driven apprehension/uncertainty about arbitrarily good experiences, my ethics will lean somewhat suffering focused.
Comments sorted by top scores.