Posts
Comments
It doesn't matter how many fake versions of you hold the wrong conclusion about their own ontological status, since those fake beliefs exist in fake versions of you. The moral harm caused by a single real Chantiel thinking they're not real is infinitely greater than infinitely many non-real Chantiels thinking they are real.
Interesting. When you say "fake" versions of myself, do you mean simulations? If so, I'm having a hard time seeing how that could be true. Specifically, what's wrong about me thinking I might not be "real"? I mean, if I though I was in a simulation, I think I'd do pretty much the same things I would do if I thought I wasn't in a simulation. So I'm not sure what the moral harm is.
Do you have any links to previous discussions about this?
"If the real Chantiel is so correlated with you that they will do what you will do, then you should believe you're real so that the real Chantiel will believe they are real, too. This holds even if you aren't real."
By "real", do you mean non-simulated? Are you saying that even if 99% of Chantiels in the universe are in simulations, then I should still believe I'm not in one? I don't know how I could convince myself of being "real" if 99% of Chantiels aren't.
Do you perhaps mean I should act as if I were non-simulated, rather than literally being non-simulated?
Thanks for the response, Gwern.
he is explicit that the minds in the simulation may be only tenuously related to 'real'/historical minds;
Oh, I guess I missed this. Do you know where Bostrom said the "simulations" can only tenuously related to real minds? I was rereading the paper but didn't see mention of this. I'm just surprised, because normally I don't think zoo-like things would be considered simulations.
This falls under either #1 or #2, since you don't say what human capabilities are in the zoo or explain how exactly this zoo situation matters to running simulations; do we go extinct at some time long in the future when our zookeepers stop keeping us alive (and "go extinct before reaching a “posthuman” stage"), having never become powerful zookeeper-level civs ourselves, or are we not permitted to ("extremely unlikely to run a significant number of simulations")?
In case I didn't make it clear, I'm saying that even if a significant proportion of civilization reach a post-human stage and a significant proportion of these run simulations, there would still potentially be a non-small chance of actually not being in a simulation an instead being in a game or zoo. For example, suppose each post-human civilization makes 100 proper simulations and 100 zoos. Then even if parts 1 and 2 of the simulation argument are true, you still have a 50% chance of ending up in a zoo.
Does this make sense?
I've realized I'm somewhat skeptical of the simulation argument.
The simulation argument proposed by Bostrom argued, roughly, that either almost exactly all Earth-like worlds don't reach a posthuman level, almost exactly all such civilizations don't go on to build many simulations, or that we're almost certainly in a simulation.
Now, if we knew that the only two sorts of creatures that experience what we experience are either in simulations or the actual, original, non-simulated Earth, then I can see why the argument would be reasonable. However, I don't know how we could know this.
For example, consider zoos: Perhaps advanced aliens create "zoos" featuring humans in an Earth-like world, for their own entertainment or other purposes. These wouldn't necessarily be simulations of any actual other planet, but might merely have been inspired by actual planets. Similarly, lions in the zoo are similar to lions in the wild, and their enclosure features plants and other environmental feature similar to what they would experience in the wild. But I wouldn't call lions in zoos simulations of wild lions, even if the developed parts where humans could view them was completely invisible to them and their enclosure was arbitrarily large.
Similarly, consider games: Perhaps aliens create games or something like them set in Earth-like worlds that aren't actually intended to be simulations of any particle world. Similarly, human fantasy RPGs often have a medieval theme, so maybe aliens would create games set in a modern-Earth-like world, without having in mind any actual planet to simulate.
Now, you could argue that in an infinite universe, these things are all actually simulations, because there must be some actual, non-simulated world that's just like the "zoo" or game. However, by that reasoning, you could argue that a rock you pick up is nothing but a "rock simulation" because you know there is at least one other rock in the universe with the exact same configuration and environment as the rock you're holding. That doesn't seem right to me.
Similarly, you could say, then, that I'm actually in a simulation right now. Because even if I'm in the original Earth, there is some other Chantiel in the universe in a situation identical to my current one, who is logically constrained to do the same thing I do, so thus I am a simulation of her. And my environment is thus a simulation of hers.
For robustness, you have a dataset that's drawn from the wrong distribution, and you need to act in a way that you would've acted if it was drawn from the correct distribution. If you have an amplification dynamic that moves models towards few attractors, then changing the starting point (training distribution compared to target distribution) probably won't matter. At that point the issue is for the attractor to be useful with respect to all those starting distributions/models. This doesn't automatically make sense, comparing models by usefulness doesn't fall out of the other concepts.
Interesting. Do you have any links discussing this? I read Paul Christiano's post on reliability amplification, but couldn't find mention of this. And, alas, I'm having trouble finding other relevant articles online.
Amplification induces a dynamic in the model space, it's a concept of improving models (or equivalently in this context, distributions). This can be useful when you don't have good datasets, in various ways. Also it ignores independence when talking about recomputing things
Yes, that's true. I'm not claiming that iterated amplification doesn't have advantages. What I'm wondering is if non-iterated amplification is a viable alternative. I haven't seen non-iterated amplification proposed before for creating algorithm AI. Amplification without iteration has the disadvantage that it may not have the attractor dynamic iterated amplification has, but it also doesn't have the exponentially increasing unreliability iterated amplification has. So, to me at least, it's not clear to me if pursuing iterated amplification is a more promising strategy than amplification without iteration.
I've been thinking about what you've said about iterated amplification, and there are some things I'm unsure of. I'm still rather skeptical of the benefit of iterated amplification, so I'd really appreciate a response.
You mentioned that iterated amplification can be useful when you have only very limited, domain-specific models of human behavior, where such models would be unable to come up with the ability to create code. However, there are two things I'm wondering about. The first is that it seems to me that, for a wide range of situations, you need a general and robustly accurate model of human behavior to perform well. The second is that, even if you don't have a general model of human behavior, it seems to me that it's sufficient to only have one amplification step, which I suppose isn't iterated amplification. And the big benefit to avoiding iterated amplification is that iterated amplification results in exponential decreases in reliability from compounding errors on each distillation step, but with a single amplification step, this exponential decrease in reliability wouldn't occur.
For the first topic, suppose your AI is trained to make movies. I think just about every human value is relevant to the creation of movies, because humans usually like movies with a happy ending, and to make an ending happy you need to understand what humans consider a "happy ending".
Further, you would need an accurate model of human cognitive capabilities. To make a good movie, it needs to be easy enough for humans to understand. But sometimes it also shouldn't be too easy, because that can remove the mystery of it.
And the above is not just true for movies: I think creating other forms of entertainment would involve the same things as above.
Could you do the above with only some domain-limited model of what counts as confusing or a good or bad ending in the context of movies? It's not clear to me that this is possible. Movies involve a very wide variety of situations, and you need to keep things understandable and resulting in a happy ending in all of those circumstances. I don't see how could you robustly do the above without a general model of what people people find confusing or otherwise bad.
Further, whenever an AI needs to explain something to humans, it seems to me that it's important that it has an accurate model of what humans can understand and not understand. Is there any way to do this with purely domain-specific models rather than with a general understanding of what people find confusing? It's not clear to me that this is possible. For example, imagine an AI that needs to explain many different things. Maybe it's tasked with creating learning materials or making the news. With such a broad category of things the AI needs to explain, it's really not clear to me how an AI could do this without a general model of what makes things confusing or not.
Also more generally, it seems to me that whenever the AI is involved with human interaction in novel circumstances, it will need an accurate model of what people like and dislike. For example, consider an AI tasked with coming up with a plan for human workers. Doing so has the potential to involve an extremely wide range of values. For example, humans generally value novelty, autonomy, not feeling embarrassed, not being bored, not being overly pressured, not feeling offended, and not seeing disgusting or ugly things.
Could you have an AI learn to avoid things things with only domain-specific models, rather than a general understanding of what people value and disvalue? I'm not sure how to do this. Maybe you could learn models that work for reflecting people's values in limited circumstances. However, I think an essential component of intelligence is to come up with novel plans involving novel situations. And I don't see how an agent could do this without a general understanding of values. For example, the AI might create entire new industries, and it would be important that any human workers in those industries would have satisfactory conditions.
Now, for the second topic: using amplification without iteration.
First off, I want to note that, even without a general model of humans, it's still not really clear to me that you need any amplification at all. As I've said before, even mere human imitation the potential to result in extremely high intelligence simply by doing the same things humans do, but much faster. As I mentioned previously, consider the human output to be published research papers from top researchers, and the AI is tasked with mimicking it. Then the AI could take the research papers as the human output and use this to create future papers but far far faster.
But suppose you do still need amplification. Then I don't see why one amplification step wouldn't be enough. I think that if you put together a sufficiently large number of intelligent humans and give them unlimited time to think, they'd be able to solve pretty much anything that iterated amplification with HCH would be able to solve. So, instead of having multiple amplification and distillation steps, you could instead just have one very large amplification step that would involve a large enough number of humans models interacting that it could solve pretty much anything.
If the amplification step involve a sufficiently large number of people, you might be concerned that it would be intractable to emulate them all.
I'm not sure if this would be a problem. Consider again the AI designed to mimic the research papers of top researchers. I think that often a small number of top researchers are responsible for a large proportion of research progress, so the AI could potentially just see that output of the top, say, 100 or 1000 researchers working together would be. And the AI would potentially be able to produce the outputs of each researcher with far less computation. That sounds plausibly like enough to me.
But suppose that's not enough, and emulating every human individually during the amplification step is intractable. Then here's how I think you can get around this: train not only a human model, but also a system of approximating the output of an expensive computation with much lower computational cost. Then, for the amplification step, you can define an computing involving an extremely large number of interacting emulated humans, and then allow the approximation system to come up with approximations to this without needing to directly emulate every human.
To give a sense of how this might work, note that in a computation, often a small amount of the parts of the computation account for a large part of the output. For example, if you are trying to approximate a computation about gravity, commonly only the closest, most massive objects have significant gravitational effect on something, and you can ignore the rest. Similarly, rather than simulate individual atoms, it's much more efficient to come up with groups of large number of atoms, and consider their effect as a group. The same is true for other computations involving many small components.
To emulate humans, you could potentially do the same things as you would when simulating gravity. Specifically, an AI may be able to consider groups of humans and infer what the final output of that group will be, without actually needing to emulate each one individually. Further, for very challenging topics, many people may fail to contribute anything to the final result, so the could potentially avoid emulating them at all.
So I still can't really see the benefit of iterated amplification. Of course, I could be missing something, so I'm interesting in hearing what you think.
One potential problem is that it might be hard to come up with good training data for an arbitrary-function-approximator, since finding the exact output of expensive functions would be expensive. However, it's not clear to me how big of a problem this would be. As I've said before, even the output of a 100 or 1000 humans interacting could potentially be all the AI ever needs, and with sufficient fast approximations of individual humans, this could be tractable to create training data for.
Further, I bet the AI could learn a lot about arbitrary-function approximation just by training on approximating functions that are already reasonably fast the compute. I think the basic techniques to quickly approximating functions are what I mentioned before: come up with abstract objects that involve groups of individual components, and know when to stop performing the computation on a certain object because it's clear it will have little effect on the final result.
I hadn't fully appreciated to difficultly that could result from AIs having alien concepts, so thanks for bringing it up.
However, it seems to me that this would not be a big problem, provided the AI is still interpretable. I'll provide two ways to handle this.
For one, you could potentially translate the human concepts you care about into statements using the AI's concepts. Even if the AI doesn't use the same concepts people do, AIs are still incentivized to form a detailed model of the world. If you can have access to all the AI's world model, but still can't figure out basic things like if the model means the world gets destroyed or the AI takes over the world, then that model doesn't seem very interperable. So I'm skeptical that this would really be a problem.
But, if it is, it seems to me that there's a way to get the AI to have non-alien concepts.
In a comment with another person, made a modification to the system by saying that the people outputting utilities should be able to refuse to output one in a given query, for example because the situation is too complicated or to vague for humans to understand that desirability of. This could potentially allow for people to avoid having the AI from having very aliens concepts.
To deal with alien concepts, you can just have the people refuse to provide an answer to the utility of a possible for description if the description is described. This way, the AI would need to come up with sufficiently non-alien concepts before it can understand the utility of things. The AI would have to come up with reasonably non-alien concepts in order to get any of its calls to its utility function to work.
Another problem is that the system cannot represent and communicate the whole predicted future history of the universe to us.
This is a good point and one that I, foolishly, hadn't considered.
However, it seems to me that there is a way to get around this. Specifically, just provide the query-answerers the option to refuse to evaluate the utility of a description of a possible future. If this happens, the AI won't be able to have its utility function return a value for such a possible future.
To see how to do this, note that if a description of a possible future world is too large for the human to understand, then the human can refuse to provide a utility for it.
Similarly, if the description of the future doesn't specify the future with sufficient detail that the person can clearly tell if the described outcome would be good, then the person can also refuse to return a value.
For example, suppose you are making an AI designed to make paperclips. And suppose the AI queries the person asking for the utility of the possible future described by, "The AI makes a ton of paperclips". Then the person could refuse to answer, because the description is insufficient to specify the quality of the outcome, for example, because it doesn't say whether or not Earth got destroyed.
Instead, a possible future would only be rated as high utility if it says something like,"The AI makes a ton of paperclips, and the world isn't destroyed, and the AI doesn't take over the world, and no creatures get tortured anywhere in our Hubble sphere, and creatures in the universe are generally satisfied".
Does this make sense?
I, of course, could always be missing something.
(Sorry for the late response)
Sorry for taking a ridiculously long time to get back to you. I was dealing with some stuff.
This works great when you can recognize good things within the represention the AI uses to think about the world. But what if that's not true?
Yes, that is correct. As I said in the article, a high degree of interpretability is necessary to use the idea.
It's true that interpretability is required, but the key point of my scheme is this: interpretability is all you need for intent alignment, provided my scheme is correct. I don't know of any other alignment strategies for which which this is the case. So, my scheme, if correct, basically allows you to bypass what is plausibly the hardest part of AI safety: robust value-loading.
I know of course that I could be wrong about this, but if the technique is correct, it seems like a quite promising AI safety technique to me.
Does this seem reasonable? I may very well be just be misunderstanding or missing something.
I've made a few posts that seemed to contain potentially valuable ideas related to AI safety. However, I got almost no feedback on them, so I was hoping some people could look at them and tell me what they think. They still seem valid to me, and if they are, they could potentially be very valuable contributions. And if they aren't valid, then I think knowing the reason for this could potentially help me a lot in my future efforts towards contributing to AI safety.
The posts are:
FWIW, this conclusion is not clear to me. To return to one of my original points: I don't think you can dodge this objection by arguing from potentially idiosyncratic preferences, even perfectly reasonable ones; rather, you need it to be the case that no rational agent could have different preferences. Either that, or you need to be willing to override otherwise rational individual preferences when making interpersonal tradeoffs.
Yes, that's correct. It's possible that there are some agents with consistent preferences that really would wish to get extraordinarily uncomfortable to avoid the torture. My point was just that this doesn't seem like it would would be a common thing for agents to want.
Still, it is conceivable that there are at least a few agents out their that would consistently want to opt for the 0.5 chance of being extremely uncomfortable option, and I do suppose it would be best to respect their wishes. This is a problem that I hadn't previously fully appreciated, so I would like to thank you for brining it up.
Luckily, I think I've finally figured out a way to adapt my ethical system to deal with this. That is, the adaptation will allow for agents to choose the extreme-discomfort-from-dust-specks option if that is what they wish for my my ethical system to respect their preferences. To do this, allow for the measure to satisfaction to include infinitesimals. Then, to respect the preferences of such agents, you just need need to pick the right satisfaction measure.
Consider the agent that for which each 50 years of torture causes a linear decrease in their utility function. For simplicity, imagine torture and discomfort are the only things the agent cares about; they have no other preferences; also assume that the agent dislike torture more than it dislikes discomfort, but only be a finite amount. Since the agent's utility function/satisfaction measure is linear, I suppose being tortured for an eternity would be infinitely worse for the agent than being tortured for a finite amount of time. So, assign satisfaction 0 to the scenario in which the agent is tortured for eternity. And if the agent is instead tortured for years, let the agent's satisfaction be , where is whatever infinitesimal number you want. If my understanding of infinitesimals is correct, I think this will do what we want it to do in terms having agents using my ethical system respect the agent's preferences.
Specifically, since being tortured forever would be infinitely worse than being tortured for a finite amount of time, any finite amount of torture would be accepted to decrease the chance of infinite torture. And this is what maximizing this satisfaction measure does: for any lottery, changing the chance of infinite torture has finite affect on expected satisfaction, whereas changing the chance of finite torture only has infinitesimal effect, so so avoiding infinite torture would be prioritized.
Further, among lotteries involving finite amounts of torture, it seems the ethical system using this satisfaction measure continues to do what what it's supposed to do. For example, consider the choice between the previous two options:
- A 0.5 chance of being tortured for 3^^^^3 years and a 0.5 chance of being fine.
- A 0.5 chance of 3^^^^3 - 9999999 years of torture and 0.5 chance of being extraordinarily uncomfortable.
If I'm using my infinitesimal math right, the expected satisfaction of taking option 1 would be , and the expected satisfaction of taking option 2 would be , for some . Thus, to maximize this agent's satisfaction measure, my moral system would indeed let the agent give infinite priority to avoiding infinite torture, the ethical system would itself consider the agent to get infinite torture infinitely-worse than getting finite torture, and would treat finite amounts of torture as decreasing satisfaction in a linear manner. And, since the utility measure is still technically bounded, it would still avoid the problem with utility monsters.
(In case it was unclear, is Knuth's up-arrow notion, just like "^".)
If the impact measure was poorly implemented, then I think such an impact-reducing AI could indeed result in the world turning out that way. However, note that the technique in the paper is intended to, for a very wide range of variables, make the world if the AI wasn't turned on as similar as possible to what it would be like if it was turned on. So, you can potentially avoid the AI-controlled-drone scenario by including the variable "number of AI-controlled drones in the world" or something correlated with it, as these variables could be have quite different values between a possible world in which the AI was turned on and a possible world in which the AI wasn't.
Coming up with a set of variables wide enough to include that might seem a little difficult, but I'm not sure it would be. One option is to, for every definable function of the world, include the value of the function as one of the variables the AI considers and tries to avoid interfering with.
I have some concerns about an impact measure proposed here. I'm interested on working on impact measures, and these seem like very serious concerns to me, so it would be helpful seeing what others think about them. I asked Stuart, one of the authors, about these concerns, but he said it was too busy to work on dealing with them.
First, I'll give a basic description of the impact measure. Have your AI be turned on from some sort of stochastic process that may or may not result in the AI being turned on. For example, consider sending a photo through a semi-silvered mirror. Let X be the statement, "The AI was turned on". For a wide range of variables V, try to make the probability distribution as similar as possible to the probability distribution .
Here are my concerns:
It's not clear to me that we can find any verifiably-stochastic process. Some interpretations of quantum physics say that there are some latent variables that determine the outcome of seemingly stochastic processes. In this case, just knowing the AI was or was not turned on may have thus implications on latent variables that possibly have large effects on the world.
Further, I can't find a way to define X and not-X that works, even if you find a truly stochastic process. You could try to specify X as, "the photon makes it through the semi-silvered mirror" and specify not-X as the negation of this. But conditioning on this and nothing else is insufficient. To see why, not that X and not-X don't specify anything about the mechanics about how the photon and mirror interact. For example, X, above doesn't rule out universes in which there is no quantum physics as we know it, and the mirror and photon interact in some completely different way that makes the photon more likely to pass through than it would in our universe. Thus, conditioning on X would make it more likely the AI would be in that universe, which could have large effects on the values of variables.
You could try to get around this by also including in X and not-X mechanics describing how the photon and semi-silvered mirror interact. The issue is, we don't actually know exactly how the photon and semi-silvered mirror interact. We have models of quantum physics that have extremely high predictive accuracy in practice, but we still don't have a theory of everything. So we could include in X and not-X our current understanding of quantum physics, but that understanding might be somewhat wrong.
This could potentially be dangerous. Suppose the AI has invented some technology that, according to the actual, correct theory of everything, would destroy the world. But according to our slightly wrong model of quantum physics, it would do nothing in particular. Then the AI, having X and not-X condition on our current model of quantum physics, would see no problem with creating the technology.
Further still, I still have the concern about what to do if the AI in fact doesn't turn on. You could decide to just turn on some AI identical to it, and keep trying turning on AIs until you get an one to turn on. But then the world would pretty much be identical whether X or not-X turns out to be true, because the other AI would do the same thing it would have.
You could instead turn on some AI that uses the same reduced-impact technique, but is different from the current AI in some way that significantly effects the world. However, if you do this, then I think we might arrive in a very bad equilibrium in the AIs' behavior. Specifically, suppose all AIs decide to destroy the world if they're turned on. Then that means that if one of the AIs isn't turned on, some other AI will be turned on, which would also lead to the world being destroyed. If all the destroyed worlds look pretty much the same, then the AI deciding to destroy the world would have low impact according to your definition.
So suppose instead that if the AI doesn't turn on, we rely on some other AI safety technique. Suppose this AI safety technique successfully makes safe AI. In this case, we had no need to create the reduced-impact AI you described, because the other AI design would have worked fine. Suppose the alternative AI safety technique in fact doesn't work, and the result of using it is that our world gets destroyed. In that case, the conditioning on not-X would imply the world would be destroyed. So if an AI tries to make the world conditioning on X similar to it conditioning on not-X, then it would again act to destroy the world.
Luke responded to one of my points, saying, "Yes. [what to do if the AI doesn't get turned on] needs to be handled with care. A time-delay might be sufficient for this, though. My response was:
My concern with time-delays is that the AI might make some sort of time-bomb-esque device. For example, suppose the AI knows that if it wasn't turned on, another AI like it would have been turned on three months later. Then both AIs could decide to make a "time bomb" set to go off four months after the first AI was scheduled to potentially turn on. And when it goes off, the "time bomb" might destroy the world, convert it into a mass of AI-controlled nanotechnology, or do some other bad thing. This way, neither AI would actually change the world relative to if it was never turned on.
I think this framing muddies the intuition pump by introducing sadistic preferences, rather than focusing just on unboundedness below. I don't think it's necessary to do this: unboundedness below means there's a sense in which everyone is a potential "negative utility monster" if you torture them long enough. I think the core issue here is whether there's some point at which we just stop caring, or whether that's morally repugnant.
Fair enough. So I'll provide a non-sadistic scenario. Consider again the scenario I previously described in which you have a 0.5 chance of being tortured for 3^^^^3 years, but also have the repeated opportunity to cause yourself minor discomfort in the case of not being tortured and as a result get your possible torture sentence reduced by 50 years.
If you have an unbounded below utility function in which each 50 years causes a linear decrease in satisfaction or utility, then to maximize expected utility or life satisfaction, it seems you would need to opt for living in extreme discomfort in the non-torture scenario to decrease your possible torture time be an astronomically small proportion, provided the expectations are defined.
To me, at least, it seems clear that you should not take the opportunities to reduce your torture sentence. After all, if you repeatedly decide to take them, you will end up with a 0.5 chance of being highly uncomfortable and a 0.5 chance of being tortured for 3^^^^3 years. This seems like a really bad lottery, and worse than the one that lets me have a 0.5 chance of having an okay life.
Sorry, sloppy wording on my part. The question should have been "does this actually prevent us having a consistent preference ordering over gambles over universes" (even if we are not able to represent those preferences as maximising the expectation of a real-valued social welfare function)? We know (from lexicographic preferences) that "no-real-valued-utility-function-we-are-maximising-expectations-of" does not immediately imply "no-consistent-preference-ordering" (if we're willing to accept orderings that violate continuity). So pointing to undefined expectations doesn't seem to immediately rule out consistent choice.
Oh, I see. And yes, you can have consistent preference orderings that aren't represented as a utility function. And such techniques have been proposed before in infinite ethics. For example, one of Bostrom's proposals to deal with infinite ethics is the extended decision rule. Essentially, it says to first look at the set of actions you could take that would maximize P(infinite good) - P(infinite bad). If there is only one such action, take it. Otherwise, take whatever action among these that has highest expected moral value given a finite universe.
As far as I know, you can't represent the above as a utility function, despite it being consistent.
However, the big problem with the above decision rule is that it suffers from the fanaticism problem: people would be willing to bear any finite cost, even 3^^^3 years of torture, to have even an unfathomably small chance of increasing the probability of infinite good or decreasing the probability of infinite bad. And this can get to pretty ridiculous levels. For example, suppose you are sure you can easily design a world that makes every creature happy and greatly increases the moral value of the world in a finite universe if implemented. However, you know that coming up with such a design would take one second of computation on your supercomputer, which means one less second to keep thinking about astronomically-improbable situations in which you could cause infinite good. Thus would have some minuscule chance of avoiding infinite good or causing infinite bad. Thus, you decide to not help anyone, because you won't spare the one second of computer time.
More generally, I think the basic property of non-real-valued consistent preference orderings is that they value some things "infinitely more" than others. The issue is, if you really value some property infinitely more than some other property of lesser importance, it won't be worth your time to even consider pursuing the property of lesser importance, because it's always possible you could have used the extra computation to slightly increase your chances of getting the property of greater importance.
I think this framing muddies the intuition pump by introducing sadistic preferences, rather than focusing just on unboundedness below. I don't think it's necessary to do this: unboundedness below means there's a sense in which everyone is a potential "negative utility monster" if you torture them long enough. I think the core issue here is whether there's some point at which we just stop caring, or whether that's morally repugnant.
Fair enough. So I'll provide a non-sadistic scenario. Consider again the scenario I previously described in which you have a 0.5 chance of being tortured for 3^^^^3 years, but also have the repeated opportunity to cause yourself minor discomfort in the case of not being tortured and as a result get your possible torture sentence reduced by 50 years.
If you have an unbounded below utility function in which each 50 years causes a linear decrease in satisfaction or utility, then to maximize expected utility or life satisfaction, it seems you would need to opt for living in extreme discomfort in the non-torture scenario to decrease your possible torture time be an astronomically small proportion, provided the expectations are defined.
To me, at least, it seems clear that you should not take the opportunities to reduce your torture sentence. After all, if you repeatedly decide to take them, you will end up with a 0.5 chance of being highly uncomfortable and a 0.5 chance of being tortured for 3^^^^3 years. This seems like a really bad lottery, and worse than the one that lets me have a 0.5 chance of having an okay life.
Sorry, sloppy wording on my part. The question should have been "does this actually prevent us having a consistent preference ordering over gambles over universes" (even if we are not able to represent those preferences as maximising the expectation of a real-valued social welfare function)? We know (from lexicographic preferences) that "no-real-valued-utility-function-we-are-maximising-expectations-of" does not immediately imply "no-consistent-preference-ordering" (if we're willing to accept orderings that violate continuity). So pointing to undefined expectations doesn't seem to immediately rule out consistent choice.
Oh, I see. And yes, you can have consistent preference orderings that aren't represented as a utility function. And such techniques have been proposed before in infinite ethics. For example, one of Bostrom's proposals to deal with infinite ethics is the extended decision rule. Essentially, it says to first look at the set of actions you could take that would maximize P(infinite good) - P(infinite bad). If there is only one such action, take it. Otherwise, take whatever action among these that has highest expected moral value given a finite universe.
As far as I know, you can't represent the above as a utility function, despite it being consistent.
However, the big problem with the above decision rule is that it suffers from the fanaticism problem: people would be willing to bear any finite cost, even 3^^^3 years of torture, to have even an unfathomably small chance of increasing the probability of infinite good or decreasing the probability of infinite bad. And this can get to pretty ridiculous levels. For example, suppose you are sure you can easily design a world that makes every creature happy and greatly increases the moral value of the world in a finite universe if implemented. However, you know that coming up with such a design would take one second of computation on your supercomputer, which means one less second to keep thinking about astronomically-improbable situations in which you could cause infinite good. Thus would have some minuscule chance of avoiding infinite good or causing infinite bad. Thus, you decide to not help anyone, because you won't spare the 1 second of computer time.
More generally, I think the basic property of non-real-valued consistent preference orderings is that they value some things "infinitely more" than others. The issue is, if you really value some property infinitely more than some other property of lesser importance, it won't be worth your time to even consider pursuing the property of lesser importance, because it's always possible you could have used the extra computation to slightly increase your chances of getting the property of greater importance.
Also, in addition to my previous response, I want to note that the issues with unbounded satisfaction measures are not unique to my infinite ethical system. Instead, they are common potential problems with a wide variety of aggregate consequentialist theories.
For example, imagine suppose your a classical utilitarianism with an unbounded utility measure per person. And suppose you know that the universe is finite will consist of a single inhabitant with a utility whose probability distributions follows a Cauchy distribution. Then your expected utilities are undefined, despite the universe being knowably finite.
Similarly, imagine if you again used classical utilitarianism but instead you have a finite universe with one utility monster and 3^^^3 regular people. Then, if your expected utilities are defined, you would need to give the utility monster what it wants, to the expense of of everyone else.
So, I don't think your concern about keeping utility functions bounded is unwarranted; I'm just noting that they are part of a broader issue with aggregate consequentialism, not just with my ethical system.
Thanks. I've toyed with similar ideas perviously myself. The advantage, if this sort of thing works, is that it conveniently avoids a major issue with preference-based measures: that they're not unique and therefore incomparable across individuals. However, this method seems fragile in relying on a finite number of scenarios: doesn't it break if it's possible to imagine something worse than whatever the currently worst scenario is? (E.g. just keep adding 50 more years of torture.) While this might be a reasonable approximation in some circumstances, it doesn't seem like a fully coherent solution to me.
As I said, you can allow for infinitely-many scenarios if you want; you just need to make it so the supremum of them their value is 1 and the infimum is 0. That is, imagine there's an infinite sequence of scenarios you can come up with, each of which is worse than the last. Then just require that the infimum of the satisfaction of those sequences is 0. That way, as you consider worse and worse scenarios, the satisfaction continues to decrease, but never gets below 0.
IMO, the problem highlighted by the utility monster objection is fundamentally a prioritiarian one. A transformation that guarantees boundedness above seems capable of resolving this, without requiring boundedness below (and thus avoiding the problematic consequences that boundedness below introduces).
One issue with only having boundedness above is that is that the expected of life satisfaction for an arbitrary agent would probably often be undefined or in expectation. For example, consider if an agent had a probability distribution like a Cauchy distribution, except that it assigns probability 0 to anything about the maximize level of satisfaction, and is then renormalized to have probabilities sum to 1. If I'm doing my calculus right, the resulting probability distribution's expected value doesn't converge. You could either interpret this as the expected utility being undefined or being , since the Rienmann sum approaches as the width of the column approaches zero.
That said, even if the expectations are defined, it doesn't seem to me that keeping the satisfaction measure bounded above but not bellow would solve the problem of utility monsters. To see why, imagine a new utility monster as follows. The utility monster feels an incredibly strong need to have everyone on Earth be tortured. For the next hundred years, its satisfaction will will decrease by 3^^^3 for every second there's someone on Earth not being tortured. Thus, assuming the expectations converge, the moral thing to do, according to maximizing average, total, or expected-value-conditioning-on-being-in-this-universe life satisfaction is to torture everyone. This is a problem both in finite and infinite cases.
A final random thought/question: I get that we can't expected utility maximise unless we can take finite expectations, but does this actually prevent us having a consistent preference ordering over universes, or is it potentially just a representation issue?
If I understand what you're asking correctly, you can indeed have consistent preferences over universes, even if you don't have a bounded utility function. The issue is, in order to act, you need more than just a consistent preference order over possible universe. In reality, you only get to choose between probability distributions over possible worlds, not specific possible worlds. And this, with an unbounded utility function, will tend to result in undefined expected utilities over possible actions and thus is not informative of what action you should take. Which is the whole point of utility theory and ethics.
Now, according to some probability distributions, can have well-defined expected values even with an unbounded utility function. But, as I said, this is not robust, and I think that in practice expected values of an unbounded utility function would be undefined.
For the record, according to my intuitions, average consequentialism seems perfectly fine to me in a finite universe.
That said, if you don't like using average consequentialism in a finite case, I don't personally see what's wrong with just having a somewhat different ethical system for finite cases. I know it seems ad-hoc, but I think there really is an important distinction between finite and infinite scenarios. Specifically, people have the moral intuition that larger numbers of satisfied lives are more valuable than smaller numbers of them, which average utilitarianism conflicts with. But in an infinite universe, you can't change the total amount of satisfaction or dissatisfaction.
But, if you want, you could combine both the finite ethical system and infinite ethical system so that a single principle is used for moral deliberation. This might make it feel less ad-hocy. For example, you could have a moral value function that of the form, f(total amount of satisfaction and dissatisfaction in the universe) * expected value of life satisfaction for an arbitrary agent in this universe. And let f be some bounded function that's maximized by and approaches this value very slowly.
For those who don't want this, they are free to use my total-utilitarian-infinite-ethical system. I think that it just ends up as regular total utilitarian in a finite world, or close to it.
In P(old probability of being in first group) * 1 = (P(old probability of being in first group) + $\epsilon) * u the epsilon is smaller than any real number and there is no real small enough that it could characterise the difference between 1 and u.
Could you explain why you think so? I had already explained why would be real, so I'm wondering if you had an issue with my reasoning. To quote my past self:
Remember that if you decide to take a certain action, that implies that other agents who are sufficiently similar to you and in sufficiently similar circumstances also take that action. Thus, you can acausally have non-infinitesimal impact on the satisfaction of agents in situations of the form, "An agent in a world with someone just like Slider who is also in very similar circumstances to Slider's." The above scenario is of finite complexity and isn't ruled out by evidence. Thus, the probability of an agent ending up in such a situation, conditioning only only on being some agent in this universe, is nonzero [and non-infinitesimal].
If you have some odds or expectations that deal with groups and you have other considerations that deal with a finite amount of individuals you either have the finite people not impact the probabilities at all or the probabilities will stay infinidesimally close (for which is see a~b been used as I am reading up on infinities) which will conflict with the desarata...
Just to remind you, my ethical system basically never needs to worry about finite impacts. My ethical system doesn't worry about causal impacts, except to the extent that the inform you about the total acausal impact of your actions on the moral value of the universe. All things you do have infinite acausal impact, and these are all my system needs to consider. To use my ethical system, you don't even need a notion of causal impact at all.
It's possible that (a) is true, and much of your response seems like it's probably (?) targeted at that claim, but FWIW, I don't think this case can be convincingly made by appealing to contingent personal values: e.g. suggesting that another 50 years of torture wouldn't much matter to you personally won't escape the objection, as long as there's a possible agent who would view their life-satisfaction as being materially reduced in the same circumstances.
To some extent, whether or not life satisfaction is bounded just comes down to how you want to measure it. But it seems to me that any reasonable measure of life satisfaction really would be bounded.
I'll clarify the measure of life satisfaction I had in mind. Imagine if you showed an agent finitely-many descriptions of situations they could end up being in, and asked the agent to pick out the worst and the best of all of them. Assign the worst scenario satisfaction 0 and the best scenario satisfaction 1. For any other outcome w set the satisfaction to p, where p is the probability in which the agent would be indifferent between getting satisfaction 1 with probability p and satisfaction 0 with probability 1 - p. This is very much like a certain technique for constructing a utility function from elicited preferences. So, according to my definition, life satisfaction is bounded by definition.
(You can also take the limit of the agent's preferences as the number of described situations approaches infinite, if you want and if it converges. If it doesn't, then you could instead just ask the agent about its preferences with infinitely-many scenarios and require the infimum of satisfactions to be 0 and the supremum to be 1. Also you might need to do something special to deal with agents with preferences that are inconsistent even given infinite reflection, but I don't think this is particularly relevant to the discussion.)
Now, maybe you're opposed to this measure. However, if you reject it, I think you have a pretty big problem you need to deal with: utility monsters.
To quote Wikipedia:
A hypothetical being, which Nozick calls the utility monster, receives much more utility from each unit of a resource they consume than anyone else does. For instance, eating a cookie might bring only one unit of pleasure to an ordinary person but could bring 100 units of pleasure to a utility monster. If the utility monster can get so much pleasure from each unit of resources, it follows from utilitarianism that the distribution of resources should acknowledge this. If the utility monster existed, it would justify the mistreatment and perhaps annihilation of everyone else, according to the mandates of utilitarianism, because, for the utility monster, the pleasure they receive outweighs the suffering they may cause.
If you have some agents with unbounded measures satisfaction, then I think that would imply you would need to be willing cause arbitrary large amounts of suffering of agents with bounded satisfaction in order to increase the satisfaction of a utility monster as much as possible.
This seems pretty horrible to me, so I'm satisfied with keeping the measure of life satisfaction to be bounded.
In principle, you could have utility monster-like creatures in my ethical system, too. Perhaps all the agents other than the monster really have very little in the way of preferences, and so their life satisfaction doesn't change much at all by you helping them. Then you could potentially give resources to the monster. However, the effect of "utility monsters" is much more limited in my ethical system, and it's an effect that doesn't seem intuitively undesirable to me. Unlike if you had an unbounded satisfaction measure, my ethical system doesn't allow a single agent to cause arbitrarily large amounts of suffering to arbitrarily large numbers of other agents.
Further, suppose you do decide to have an unbounded measure of life satisfaction and aggregate it to allow even a finite universe to have arbitrarily high or low moral value. Then the expected moral values of the world would be undefined, just like how to expected value of unbounded utility functions are undefined. Specifically, just consider having a Cauchy distribution over the moral value of the universe. Such a distribution has no expected value. So, if you're trying to maximize the expected moral value of the universe, you won't be able to. And, as a moral agent, what else are you supposed to do?
Also, I want to mention that there's a trivial case in which you could avoid having my ethical system torture the agent for 50 years. Specifically, maybe there's some certain 50 years that decreases the agent's life satisfaction a lot, even though the other 50 years don't. For example, maybe the agent dreads the idea of having more than a million years of torture, so specifically adding those last 50 years would be a problem. But I'm guessing you aren't worrying about this specific case.
Thanks for the response.
Third, the average view prefers arbitrarily small populations over very large populations, as long as the average wellbeing was higher. For example, a world with a single, extremely happy individual would be favored to a world with ten billion people, all of whom are extremely happy but just ever-so-slightly less happy than that single person.
In an infinite universe, there's already infinitely-many people, so I don't think this applies to my infinite ethical system.
First, consider a world inhabited by a single person enduring excruciating suffering. The average view entails that we could improve this world by creating a million new people whose lives were also filled with excruciating suffering if the suffering of the new people was ever-so-slightly less bad than the suffering of the original person.
Second, the average view entails the sadistic conclusion: It can sometimes be better to create lives with negative wellbeing than to create lives with positive wellbeing from the same starting point, all else equal.
In a finite universe, I can see why those verdicts would be undesirable. But in an infinite universe, there's already infinitely-many people at all levels of suffering. So, according to my own moral intuition at least, it doesn't seem that these are bad verdicts.
You might have differing moral intuitions, and that's fine. If you do have an issue with this, you could potentially modify my ethical system to make it an analogue of total utilitarianism. Specifically, consider the probability distribution something would have if it conditions on it ending up somewhere in this universe, but doesn't even know if it will be an actual agent with preferences or not.That is, it uses some prior that allows for the possibility that of ending up as a preference-free rock or something. Also, make sure the measure of life satisfaction treats existences with neutral welfare and the existences of things without preferences as zero. Now, simply modify my system to maximize the expected value of life satisfaction, given this prior. That's my total-utilitarianism-infinite-analog ethical system.
So, to give an example of how this works, consider the situation in which you can torture one person to avoid creating a large number of people with pretty decent lives. Well, the large number of people with pretty decent lives would increase the moral value of the world, because creating those people makes it more likely that a prior that something would end up as an agent with positive life satisfaction rather than some inanimate object, conditioning only on being something in this universe. But adding a tortured creature would only decrease the moral value of the universe. Thus, this total-utilitarian-infinite-analogue ethical system would prefer create the large number of people with decent lives than to tortured one creature.
Of course, if you accept this system, then you have to a way to deal with the repugnant conclusion, just like you need to find a way to deal with it using regular total utilitarian in a finite universe. I've yet to see any satisfactory solution to the repugnant conclusion. But if there is one, I bet you could extend it to this total-utilitarian-infinite-analogue ethical system. This is because because this ethical system is a lot like regular total utilitarian, except it replaces, "total number of creatures with satisfaction x" with "total probability mass of ending up as a creature with satisfaction x".
Given the lack of a satisfactory solution to the repugnant conclusion, I prefer the idea of just sticking with my average-utilitarianism-like infinite ethical system. But I can see why you might have different preferences.
Under my eror model you run into trouble when you treat any transfininte amount the same. From that perspective recognising two transfinite amounts that could be different is progress.
I guess this is the part I don't really understand. My infinite ethical system doesn't even think about transfinite quantities. It only considers the prior probability over ending up in situations, which is always real-valued. I'm not saying you're wrong, of course, but I still can't see any clear problem.
Another attempt to throw a situation you might not be able to handle. Instead of having 2 infinite groups of unknown relative size all receiving the same bad thing as compensation for the abuse 1 slice of cake for one gorup and 2 slices of cake for the second group. Could there be a difference in the group size that perfectly balances the cake slice difference in order to keep cake expectation constant?
Are you asking if there is a way to simultaneously change the group size as well as change the relative amount of cake for each group so the expected number of cakes received is constant?
If this is what you mean, then my system can deal with this. First off, remember that my system doesn't worry about the number of agents in a group, but instead merely cares about the probability of an agent ending up in that group, conditioning only on being in this universe.
By changing the group size, however you define it, you can affect the probability of you ending up in that group. To see why, suppose you can do something to add any agents in a certain situation-description into the group. Well, as long as this situation has a finite description length, the probability of ending up in that situation is non-zero, so thus stopping them from being in that situation can decrease the probability of you ending up in that group.
So, currently, the expected value of cake received from these situations is P(in first group) * 1 + P(in second group) * 2. (For simplicity, I'm assuming no one else in the universe gets cake.) So, if you increase the number of cakes received by the second group by u, you just need to decrease P(in the first group) by 2u to keep the expectation constant.
Additional challenging situation. Instead of giving 1 or 2 slices of cake say that each slice is 3 cm wide so the original choices are between 3 cm of cake and 6 cm of cake. Now take some custom amount of cake slice (say 2.7 cm) then determine what would be group size to keep the world cake expectation the same. Then add 1 person to that group. Then convert that back to a cake slice width that keeps cake expectation the same. How wide is the slice?.
If literally only one more person gets cake, even considering acaucal effects, then this would in general not affect the expected value of cake. So the slice would still be 2.7cm.
Now, perhaps you meant that you directly cause one more person to get cake, resulting acausally in infinitely-many others getting cake. If so, then here's my reasoning:
Previously, the expected value of cake received from these situations was P(in first group) * 1 + P(in second group) * 2. Since cake size in non-constant, let's add a variable to this. So let's use P(in first group) * u + P(in second group) * 2. I'm assuming only the 1-slice group gets its cake amount adjusted; you can generalize beyond this. u represents the amount of cake the first group gets, with one 3cm slice being represented as 1.
Suppose adding the extra person acausally results in an increase in the probability of ending up in the first group by . So then, to avoid changing the expected value of cake, we need P(old probability of being in first group) * 1 = (P(old probability of being in first group) + $\epsilon) * u.
Solve that, and you get u = P(old probability of being in first group) / (P(old probability of being in first group) + $\epsilon). Just plug in the exact numbers of how much adding the person changes the probability of of ending up in the group, and you can get an exact slice width.
Another formulation of the same challenge: Define a real number r for which converting that to a group size would get you a group of 5 people.
I'm not sure what you mean here. What does it mean to convert a real number to a group size? One trivial way to interpret this is that the answer is 5: if you convert 5 to a group size, I guess(?) that means a group of five people. So, there you go, the answer would be 5. I take it this isn't what you meant, though.
Did you get on board about the difference between "help all the stars" and "all the stars as they could have been"?
No, I'm still not sure what you mean by this.
My point was more that, even if you can calculate the expectation, standard versions of average utilitarianism are usually rejected for non-infinitarian reasons (e.g. the repugnant conclusion) that seem like they would plausibly carry over to this proposal as well.
If I understand correctly, average utilitarianism isn't rejected due to the repugnant conclusion. In fact, it's the opposite: the repugnant conclusion is a problem for total utilitarianism, and average utilitarianism is one way to avoid the problem. I'm just going off what I read on The Stanford Encyclopedia of Philosophy, but I don't have particular reason to doubt what it says.
Separately, while I understand the technical reasons for imposing boundedness on the utility function, I think you probably also need a substantive argument for why boundedness makes sense, or at least is morally acceptable. Boundedness below risks having some pretty unappealing properties, I think.
Yes, I do think boundedness is essential for a utility function. The issue unbounded utility functions is that the expected value according to some probability distributions will be undefined. For example, if your utility follows a Cauchy distribution, then the expected utility is undefined.
Your actual probability distribution over utilities in an unbounded utility function wouldn't exactly follow a Cauchy distribution. However, I think that for whatever reasonable probability distribution you would use in real life, an unbounded utility function have still have an undefined expected value.
To see why, note that there is a non-zero probability probability that your utility really will be sampled from a Cauchy distribution. For example, suppose you're in some simulation run by aliens, and to determine your utility in your life after the simulation ends, they sample from the Cauchy distribution. (This is supposing that they're powerful enough to give you any utility). I don't have any completely conclusive evidence to rule out this possibility, so it has non-zero probability. It's not clear to me why an alien would do the above, or that they would even have the power to, but I still have no way to rule it out with infinite confidence. So your expected utility, conditioning on being in this situation, would be undefined. As a result, you can prove that your total expected utility would also be undefined.
So it seems to me that the only way you can actually have your expected values be robustly well-defined is by having a bounded utility function.
Because the sigmoid function essentially saturates at very low levels of welfare, at some point you seem to end up in a perverse version of Torture vs. dust specks where you think it's ok (or indeed required) to have 3^^^3 people (whose lives are already sufficiently terrible) horribly tortured for fifty years without hope or rest, to avoid someone in the middle of the welfare distribution getting a dust speck in their eye.
In principle, I do think this could occur. I agree that at first it intuitively seems undesirable. However, I'm not convinced it is, and I'm not convinced that there is a value system that avoids this without having even more undesirable results.
It's important to note that the sufficiently terrible lives need to be really, really, really bad already. So much so that being horribly tortured for fifty years does almost exactly nothing to affect their overall satisfaction. For example, maybe they're already being tortured for more than 3^^^^3 years, so adding fifty more years does almost exactly nothing to their life satisfaction.
Maybe it still seems to you that getting tortured for 50 more years would still be worse than getting a dust speck in the eye of an average person. However, if so, consider this scenario. You know you have a 50% chance of being tortured for more than 3^^^^3 years, and a 50% chance not being tortured and living in a regular world. However, you have have a choice: you can agree to get a very minor form of discomfort, like a dust speck in your eye in the case in which you aren't tortured, and you will as a result tortured for 50 fewer years if you don't end up in the situation in which you get tortured. So I suppose, given what you say, you would take it. But suppose your were given this opportuinty again. Well, you'd again be able to subtract 50 years of torture and get just a dust speck, so I guess you'd take it.
Imagine you're allowed to repeat this process for an extremely long time. If you think that getting one dust speck is worth it to avoid 50 years of torture, then I think you would keep accepting one more dust speck until your eyes have as much dust in them as they possibly could. And then, once you're done this this, you could go on to accepting some other extremely minor form of discomfort to avoid another 50 years of torture. Maybe you you start accepting an almost-exactly-imperceptible amount of back pain for another 50 years of torture reduction. And then continue this until your back, and the rest of your body parts, hurt quite a lot.
Here's the result of your deals: you have a 50% chance of being incredibly uncomfortable. Your eyes are constantly blinded and heavily irritated by dust specs, and you feel a lot of pain all over your body. And you have a 50% chance of being horribly tortured for more than 3^^^^3 years. Note that even though you get your tortured sentence reduced by 50 * <number extremely minor discomforts you get> years, this results in the amount of time your spend tortured would decrease by a very, very, very, almost infinitesimal proportion.
Personally, I much rather have a 50% chance of being able to have a life that actually decent, even if it means that I won't get to decrease the amount of time I'd spend possibly getting tortured by a near-infinitesimal proportion.
What if you still refuse? Well, the only way I can think of justifying your refusal is by having an unbounded utility function, so getting an extra 50 years of torture is around as bad as getting the first 50 years of torture. But as I've said, the expected values of unbounded utility functions seem to be undefined in reality, so this doesn't seem like a good idea.
My point from the above is that getting one more dust speck in someone's eye could in principle be better than having someone be tortured for 50 years, provided the tortured person would already have been tortured by a super-ultra-virtually-infinitely long time anyways.
Oh, I'm sorry; you're right. I messed up on step two of my proposed proof that your technique would be vulnerable to the same problem.
However, it still seems to me that agents using your technique would also be concerning likely to fail to cross, or otherwise suffer from other problems. Like last time, suppose and that . So if the agent decides to cross, it's either because of the chicken rule, because not crossing counterfactually results in utility -10, or because crossing counterfactually results in utility greater than -10.
If the agent crosses because of the chicken rule, then this is a bad reason, so the bridge will blow up.
I had already assumed that not crossing counterfactually results in utility greater than -10, so it can't be the middle case.
Suppose instead the crossing counterfactual results in a utility greater than -10 utility. This seems very strange. By assumption, it's provable using the AI's proof system that . And the AI's counterfactual environment is supposed to line up with reality.
So, in other words, the AI has decided to cross and has already proven that crossing entails it will get -10 utility. And if the counterfactual environment assigns greater than -10 utility, then that counterfactual environment provably, within the agent's proof system, doesn't line up with reality. So how do you get an AI to both believe it will cross, believe crossing entails -10 utility, and still counterfactually thinks that crossing will result in greater than -10 utility?
In this situation, the AI can prove, within its own proof system, that the counterfactual environment of getting > -10 utility is wrong. So I guess we need an agent that allows itself to use a certain counterfactual environment even though the AI already proved that it's wrong. I'm concerned about the functionality of such an agent. If it already ignores clear evidence that it's counterfactual environment is wrong in reality, then that would really make me question that agent's ability to use counterfactual environments in other situations that line up in reality.
So it seems to me that for an agent using your take on counterfactuals to cross, it would need to either think that not crossing counterfactually results in utility , or to ignore conclusive evidence that the counterfactual environment it's using for its chosen action would in fact not line up with reality. Both of these options seem rather concerning to me.
Also, even if you do decide to let the AI ignore conclusive evidence (to the AI) that crossing makes utility be -10, I'm concerned the bridge would get blown up anyways. I know we haven't formalized "a bad reason", but we've taken it to mean something like, "something that seems like a bad reason to the AI". If the AI wants its counterfactual environments to line up with reality, and it can clearly see that, for the action it decides to take, it doesn't line up with reality, then this seems like a "bad" reason to me.
Thanks for clearing some things up. There are still some things I don't follow, though.
You said my system would be ambivalent between between sand and insult. I just wanted to make sure I understand what you're saying here. Is insult specifically throwing sand at the same people that get it thrown at in dust, and get the sand amount of sand thrown at them at the same throwing speed? If so, then it seems to me that my system would clearly prefer sand to insult. This is because there in some non-zero chance of an agent, conditioning only on being in this universe, being punched due to people like me choosing insult. This would make their satisfaction lower than it otherwise would be, thus decreasing the moral value of the universe if I chose insult over sand.
On the other hand, perhaps the people harmed by sand from "insult" would be lower than the number harmed by sand in "dust". In this situation, my ethical system could potentially prefer insult over dust. This doesn't seem like a bad thing to me, though, if it means you save some agents in certain agent-situation-descriptions from getting sand thrown at them.
Also, I'm wondering about your paragraph starting with, "The basic sitatuino is that I have intuitions which I can't formulate that well. I will try another route." If I'm understanding it correctly, I think I more or less agree with what you said in that paragraph. But I'm having a hard time understanding the significance of it. Are you intending to show a potential problem with my ethical system using it? The paragraph after it makes it seem like you were, but I'm not really sure.
The fact that it's lavishly uncomputable is a problem for using it in practice, of course :-).
Yep. To be fair, though, I suspect any ethical system that respects agents' arbitrary preferences would also be incomputable. As a silly example, consider an agent whose terminal values are, "If Turing machine T halts, I want nothing more than to jump up and down. However, if it doesn't halt, then it is of the utmost importance to me that I never jump up and down and instead sit down and frown." Then any ethical system that cares about those preferences is incomputable.
Now this is pretty silly example, but I wouldn't be surprised if there were more realistic ones. For one, it's important to respect other agents' moral preferences, and I wouldn't be surprised if their ideal moral-preferences-on-infinite-reflection would be incomputable. I seems to me that morall philosophers act as some approximation of, "Find the simplest model of morality that mostly agrees with my moral intuitions". If they include incomputable models, or arbitrary Turing machines that may or may not halt, then the moral value of the world to them would in fact be incomputable, so any ethical system that cares about preferences-given-infinite-reflection would also be incomputable.
I have some other concerns, but haven't given the matter enough thought to be confident about how much they matter. For instance: if the fundamental thing we are considering probability distributions over is programs specifying a universe and an experience-subject within that universe, then it seems like maybe physically bigger experience subjects get treated as more important because they're "easier to locate", and that seems pretty silly. But (1) I think this effect may be fairly small, and (2) perhaps physically bigger experience-subjects should on average matter more because size probably correlates with some sort of depth-of-experience?
I'm not that worried about agents that are physically bigger, but it's true that there may be some agents or agents descriptions in situations that are easier to pick out (in terms of having a short description length) then others. Maybe there's something really special about the agent that makes it easy to pin down.
I'm not entirely sure if this would be a bug or a feature. But if it's a bug, I think it could be dealt with by just choosing the right prior over agents-situations. Specifically, for any description of an environment with finitely-many agents A, make the probability of ending up as , conditioned only on being one of the agents in that environment, should be constant for all . This way, the prior isn't biased in favor of the agents that are easy to pick out.
If we define "bad reasoning" as "crossing when there is a proof that crossing is bad" in general, this begs the question of how to evaluate actions. Of course the troll will punish counterfactual reasoning which doesn't line up with this principle, in that case. The only surprising thing in the proof, then, is that the troll also punishes reasoners whose counterfactuals respect proofs (EG, EDT).
I'm concerned that may not realize that your own current take on counterfactuals respects logical to some extent, and that, if I'm reasoning correctly, could result in agents using it to fail the troll bridge problem.
You said in "My current take on counterfactuals", that counterfactual should line up with reality. That is, the action the agent actually takes should in the utility it was said to have in its counterfactual environment.
You say that a "bad reason" is one such that the agents the procedure would think is bad. The counterfactuals in your approach are supposed to line up with reality, so if an AI's counterfactuals don't line up in reality, then this seems like this is a "bad" reason according to the definition you gave. Now, if you let your agent think "I'll get < -10 utility if I don't cross", then it could potentially cross and not get blown up. But this seems like a very unintuitive and seemingly ridiculous counterfactual environment. Because of this, I'm pretty worried it could result in an AI with such counterfactual environments malfunctioning somehow. So I'll assume the AI doesn't have such a counterfactual environment.
Suppose acting using a counterfactual environment that doesn't line up with reality counts as a "bad" reason for agents using your counterfactuals. Also suppose that in the counterfactual environment in which the agent doesn't cross, the agent counterfactually gets more than -10 utility. Then:
- Suppose
- Suppose . Then if the agent crosses it must be because either it used the chicken rule or because its counterfactual environment doesn't line up with reality in this case. Either way, this is a bad reason for crossing, so the bridge gets blown up. Thus, the AI gets -10 utility.
- Thus,
- Thus, by Lob's theorem,
Thus, either the agent doesn't cross the bridge or it does and the bridge explodes. You might just decide to get around this by saying it's okay for the agent to think it would get less than -10 utility if it didn't cross. But I'm rather worried that this would cause other problems.
You seem to be assuming that the agent's architecture has solved the problem of logical updatelessness, IE, of applying reasoning only to the (precise) extent to which it is beneficial to do so. But this is one of the problems we would like to solve! So I object to the "stop thinking about it" step w/o more details of the decision theory which allows you to do so.
I'll talk about some ways I thought of potentially formalizing, "stop thinking if it's bad".
One simple way to try to do so is to have an agent using regular evidential decision theory but have a special, "stop thinking about this thing" action that it can take. Every so often, the agent considers taking this action using regular evidential decision theory. So, in the troll bridge case, it could potentially see that the path of reasoning it's following is potentially dangerous, and thus decide to stop. Also, the agent needs to avoid thinking too many thoughts before considering to take the "stop thinking about this thing" action. Otherwise, it could think all sorts of problematic thoughts before being able to stop itself.
This simple technique might actually be enough to solve the problem, especially if the AI has the ability to choose its own inference algorithm to find one that makes the AI able to realize, "thinking about this is bad" before it finds the concrete bad thing. And, for what it's worth, it's enough for me personally to get across the bridge.
Now, this might not work reliably, because the AI might stumble upon a problematic thought before realizing it's bad. So, instead, consider this. The AI has an ability to "rollback" its belief state and then adjust its search path. For example, suppose the AI thinks through the Troll Bridge proof and then realizes that it really messed up in doing so. Then it could have a hard-coded way rolling back its belief state to what it was before it thought about the troll bridge thing, mark the nodes it expanded as verboten, and then go expand some other node.
When should it rollback? Well, it shouldn't rollback every time it comes across any evidence that decreases its expected utility. Doing so would result in the AI being very poorly calibrated by ignoring all negative evidence.
One option to do this is to actually manually come up with a rollback method that both rollbacks whenever doing so increases expected utility without resulting in poor calibration.
However, this sounds sort of hard. One cheap trick get around doing this is to just give the AI a special action to modify its rollback method, and then to select the new rollback method using its regular decision procedure. For example, I think it's clear, using evidential decision theory, that choosing the rollback method, "Rollback any thought that decreases my expected utility estimate" would result in low utility. Because that would just mean the AI would delude itself. But I bet a rollback method that says, "Roll back if you find any troll bridge proofs" would work okay.
This trick might not be perfect, since the AI could potentially think about problematic before getting a rollback procedure good enough to roll it back. But as long as the AI is smart enough to realize that it should try to get a really good rollback function before doing much of anything else, then I bet it would work okay.
Also, don't forget that we still need to do something about the agent-simulates-predictor problem. In the agent-simulates-predictor problem, agents are penalized for thinking about things in too much detail. And in whatever counterfactual environment you use, you'll need a way to deal with the agent-simulates-predictor problem. I think the most obvious approach is by controlling what the AI things about. And if you've already done that, then you can pass the troll bridge problem for free.
Also, I think it's important to note that just the fact the AI is trying to avoid thinking of crossing-is-bad proofs makes the proofs (potentially) not go through. For example, in the proof you originally gave, you supposed there is a proof the crossing results in -10 utility, and thus says the agent must have crossed from the chicken rule. But if the AI is trying to avoid these sorts of "proofs", then if it does cross, it simply could have been because the AI decided to avoid following whatever train of thought would prove that it would get -10 utility. This is considered a reasonable thing to do by the AI, so it doesn't seem like a "bad" reason.
There may be possible alternative proofs that apply to an AI that tries to steer its reasoning away from problematic areas. I'm not sure, though. I also suspect that any such proofs would be more complicated and thus harder to find.
So let's try again. The key thing in your system is not a program that outputs a hypothetical being's stream of experiences, it's a program that outputs a complete description of a (possibly infinite) universe and also an unambiguous specification of a particular experience-subject within that universe. This is only possible if there are at most countably many experience-subjects in said universe, but that's probably OK.
That's closer to what I meant. By "experience-subject", I think you mean a specific agent at a specific time. If so, my system doesn't require an unambiguous specification of an experience-subject.
My system doesn't require you to pinpoint the exact agent. Instead, it only requires you to specify a (reasonably-precise) description of an agent and its circumstances. This doesn't mean picking out a single agent, as there many be infinitely-many agents that satisfy such a description.
As an example, a description could be something like, "Someone named gjm in an 2021-Earth-like world with personality <insert a description of your personality and thoughts> who has <insert description of my life experiences> and is currently <insert description of how your life is currently>"
This doesn't pick out a single individual. There are probably infinitely-many gjms out there. But as long as the description is precise enough, you can still infer your probable eventual life satisfaction.
But other than that, your description seems pretty much correct.
It's now stupid-o'-clock where I am and I need to get some sleep.
I feel you. I also posted something at stupid-o'-clock and then woke up a 5am, realized I messed up, and then edited a comment and hoped no one saw the previous error.
The integactions are all supposed to be negative in peace, punch, dust, insult. The surprising thing to me would be that the system would be ambivalent between sand and insult being a bad idea. If we don't necceasrily prefer D to C when helping does it matter if we torture our people a lot or a little as its going to get infinity saturated anyway.
Could you explain what insult is supposed to do? You didn't say what in the previous comment. Does it causally hurt infinitely-many people?
Anyways, it seems to me that my system would not be ambivalent about whether you torture people a little or a lot. Let C be the class of finite descriptions of circumstances of agents in the universe that would get hurt a little or a lot if you decide to hurt them. The probability of an agent ending up in class C is non-zero. But if you decide to torture them a lot their expected life-satisfaction would be much lower than if you decide to torture them a little. Thus, the total moral value of the universe would be lower if you decide to torture a lot rather than a little.
When I say "world has finite or infinite people" that is "within description" say that there are infinite people because I believe there are infinitely many stars. Then all the acausal copies of sol are going to have their own "out there" stars. Acts that "help all the stars" and "all the stars as they could have been" are different. Atleast until we consider that any agent that decides to "help all the stars" will have acausal shadows "that could have been". But still this consideration increases the impact on the multiverse (or keeps it the same if moving from a monoverse to a multiverse in the same step).
I can't say I'm following you here. Specifically, how do you consider, "help all the stars" and "all the stars as they could have been" to be different? I thought, "help" meant, "make it better than it otherwise could have been". I'm also not sure what counts as acausal shadows. I, alas, couldn't find this phrase used anywhere else online.
If I have real, non-zero impacts for infinite amount of people naively that would add up to a more than finite aggregate.
Remember that my ethical system doesn't aggregate anything across all agents in the universe. Instead, it merely considers finite descriptions of situations an agent could be in the universe, and then aggregates the expected value of satisfaction in these situations, weighted by probability conditioning only on being in this universe.
There's no way for this to be infinite. The probabilities of all the situations sum to 1 (they are assumed to be disjoint), and the measure of life satisfaction was said to be bounded.
And remember, my system doesn't first find your causal impact on moral value of the universe and then somehow use this to find the acausal impact. Because in our universe, I think the causal impact will always be zero. Instead, just directly worry about acausal impacts. And your acausal impact on the moral value of the universe will always be finite and non-infinitesimal.
Thanks for responding. As I said, the measure of satisfaction is bounded. And all bounded random variables have a well-defined expected value. Source: Stack Exchange.
Oh, I'm sorry; I misunderstood you. When you said the average of utilities, I thought you meant the utility averaged among all the different agents in the world. Instead, it's just, roughly, an average among probability density function of utility. I say roughly because I guess integration isn't exactly an average.
Please see this comment for an explanation.
RE: scenario one:
All these worlds come out exactly the same, so "infinitely many happy, one unhappy" is indistinguishable from "infinitely many unhappy, one happy"
It's not clear to me how they are indistinguishable. As long as the agent that's unhappy can have itself and its circumstances described with a finite description length, then it would have non-zero probability of an agent ending up as that one. Thus, making the agent unhappy would decrease the moral value of the world.
I'm not sure what would happen if the single unhappy agent has infinite complexity and 0 probability. But I suspect that this could be dealt with if you expanded the system to also consider non-real probabilities. I'm no expert on non-real probabilities, but I bet you the probability of being unhappy given there is an unhappy agent would be infinitesimally more probable than the probability in the world in which there's no unhappy agents.
RE: scenario two: It's not clear to me how this is crazy. For example, consider this situation: when agents are born, an AI flips a biased coin to determine what will happen to them. Each coin has a 99.999% chance of landing on heads and a 0.001% chance of landing on tails. If the coin lands on heads, the AI will give the agent some very pleasant experience stream, and all such agents will get the same pleasant experience stream. But if it lands on tails, the AI will give the agent some unpleasant experience stream that is also very different from the other unpleasant ones.
This sounds like a pretty good situation to me. It's not clear to me why it wouldn't be. I mean, I don't see why the diversity of the positive experiences matters. And if you do care about the diversity of positive experiences, this would have unintuitive results. For example, suppose all agents have identical preferences and they satisfaction is maximized by experience stream S. Well, if you have a problem with the satisfied agents having just one experience stream, then you would be incentivized to coerce the agents to instead have a variety of different experience streams, even if they didn't like these experience streams as much.
RE: scenario three:
The CotU has computed that with the switch in the "Nice" position, the expected utility of an experience-subject in the resulting universe is large and positive; with the switch in the "Nasty" position, it's large and negative. But in both cases every possible experience-subject has a nonzero probability of being generated at any time.
I don't follow your reasoning. You just said in the "Nice" position, the expected value of this is large and positive and in the "Nasty" it's large and negative. And since my ethical system seeks to maximize the expected value of life satisfaction, it seems trivial to me that it would prefer the "nice" button.
Whether or not you switch it to the "Nice" position won't rule out any possible outcomes for an agent, but it seems pretty clear that it would change the probabilities of them.
RE: scenario four: My ethical system would prefer the "Nice" position for the same reason described in scenario three.
RE: scenario five:
So far as I can tell, there is no difference in the set of possible experience-subjects in the world where you do and the world where you don't. Both the tortured-to-death and the not-tortured-to-death versions of me are apparently possibilities, so it seems that with probability 1 each of them will occur somewhere in this universe, so neither of them is removed from our set of possible experience-streams when we condition on occurrence in our universe.
Though none of the experience streams are impossible, the probability of you getting tortured is still higher conditioning on me deciding the torture you. To see why, note the situation, "Is someone just like Slider who is vulnerable to being tortured by demon lord Chantiel". This has finite description length, and thus non-zero probability. And if I decide to torture you, then the probability of you getting tortured if you end up in this situation is high. Thus, the total expected value of life satisfaction would be lower if I decided to torture you. So my ethical system would recommend not torturing you.
In general, don't worry about if an experience stream is possible or not. In an infinite universe with quantum noise, I think pretty much all experience streams would occur with non-zero probability. But you can still adjust the probabilities of an agent ending up with the different streams.
By one logic because we prefer B to A then if we "acausalize" this we should still preserve this preference (because "the amount of copies granted" would seem to be even handed), so we would expect to prefer D to C. However in a system where all infinites are of equal size then C=D and we become ambivalent between the options.
We shouldn't necessarily prefer D to C. Remember that one of the main things you can do to increase the moral value of the universe is to try to causally help other creatures so that other people who are in sufficiently similar circumstances to you you will also help, so you acausally make them help others. Suppose you instead have the option to instead causally help all of the agents that would have been acausally helped if you just causally help one agent. Then the AI shouldn't prefer D to C, because the results are identical.
Here an adhoc analysis might choose a clear winner. Then consider the combined problem where you have all the options, peace, punch, dust and insult. Having 1 analysis that gets applied to all options equally will run into trouble. If the analysis is somehow "turn options into real probablities" then problem with infinidesimals are likely to crop up.
Could you explain how this would cause problems? If those are the options, it seems like a clear-but case of my ethical system recommending peace, unless there is some benefit to punching, insulting, or throwing sand you haven't mentioned.
To see why, if you decide to throw sand, you're decreasing the satisfaction of agents in situations of the form "Can get sand thrown at them from someone just like Slider". This would in general decrease the moral value of the world, so my system wouldn't recommend it. The same reasoning can show that the system wouldn't recommend punching or insulting.
There could be the "modulo infinity" failure mode where peace and dust get the same number. "One class only" would fail to give numbers for one of the subproblems.
Interesting. Could you elaborate?
I'm not really clear the reason for you are worried about these different classes. Remember that any action you will do will, at least acausally, help a countably infinite number of agents. Similarly, I think all your actions will have some real-valued affect on the moral value of the universe. To see why, just note that as long as you help one agent, then the expected satisfaction of agents in situations of the form, "<description of the circumstances of the above agent> who can be helped by someone just like Slider". This has finite complexity, and thus real and non-zero probability. And the moral value of the universe is capped at whatever the domain of the life satisfaction measure is, so you can't have infinite increases to the moral value of the universe, either.
I'll begin at the end: What is "the expected value of utility" if it isn't an average of utilities?
I'm just using the regular notion of expected value. That is, let P(u) be the probability density you get utility u. Then, the expected value of utility is , where uses Lebesgue integration for greater generality. Above, I take utility to be in .
Also note that my system cares about a measure of satisfaction, rather than specifically utility. In this case, just replace P(u) to be that measure of life satisfaction instead of a utility.
Also, of course, P(u) is calculated conditioning on being an agent in this universe, and nothing else.
And how do you calculate P(u) given the above? Well, one way is to first start with some disjoint prior probability distribution over universes and situations you could be in, where the situations are concrete enough to determine your eventual life satisfaction. Then just do a Bayes update on "is an agent in this universe and get utility u" by setting the probabilities of hypothesis in which the agent isn't in this universe or doesn't have preferences. Then just renormalize the probabilities so they sum to 1. After that, you can just use this probability distribution of possible worlds W to calculate P(u) in a straightforward manner. E.g. .
(I know I pretty much mentioned the above calculation before, but I thought rephrasing it might help.)
Post is pretty long winded,a bit wall fo texty in a lot of text which seems like fixed amount of content while being very claimy and less showy about the properties.
Yeah, I see what you mean. I have a hard time balancing between being succinct and providing sufficient support and detail. It actually used to be shorter, but I lengthened it to address concerns brought up a review.
My suspicion is that the acausal impact ends up being infinidesimal anyway. Even if one would get finite probability impact for probabilties concerning a infinite universe for claims like "should I help this one person" then claims like "should I help these infinite persons" would still have an infinity class jump between the statements (even if both need to have an infinite kick into the universe to make a dent there is an additional level to one of these statements and not all infinities are equal).
Could you elaborate what you mean by a class jump?
Remember that if you ask, "should I help this one person", that is another way of saying, "should I (acausally) help this infinite class of people in similar circumstances". And I think in general the cardinality of this infinity would be the same as the cardinality of people helped by considering "should I help these infinitely-many persons"
Most likely the number of people in this universe is countably infinite, and all situations are repeated infinitely-many times. Thus, asking, "should I help this one person" would acausally help people, and so would causally helping the infinitely-many people.
I am going to anticipate that your scheme will try to rule out statements like "should I help these infinite persons" for a reason like "its not of finite complexity". I am not convinced that finite complexity descriptions are good guarantees that the described condition makes for a finite proportion of possibility space. I think "Getting a perfect bullseye" is a description of finite complexity but it describes and outcome of (real) 0 probabaility. Being positive is of no guarantee of finitude, infinidesimal chances would spell trouble for the theory. And if statements like "Slider or (near equivalent) gets a perfect bullseye" are disallowed for not being finitely groundable then most references to infinite objects are ruled out anyway. Its not exactly an infinite ethic if it is not allowed to refer to infinite things.
No, my system doesn't rule out statements of the form, "should I help these infinitely-many persons". This can have finite complexity, after all, provided there is sufficient regularity in who will be helped. Also, don't forget, even if you're just causally helping a single person, you're still acausally helping infinitely-many people. So, in a sense, ruling out helping infinitely-many people would rule out helping anyone.
I am also slightly worried that "description cuts" will allow "doubling the ball" kind of events where total probability doesn't get preserved. That phenomenon gets around the theorethical problems by designating some sets non-measurable. But then being a a set doesn't mean its measurable. I am worried that "descriptions always have a usable probablity" is too lax and will bleed from the edges like a naive assumption that all sets are measurable would.
I'm not sure what specifically you have in mind with respect to doubling the sphere-esque issues. But if your system of probabilistic reasoning doesn't preserve the total probability when partitioning an event into multiple events, that sounds like a serious problem with your probabilistic reasoning system. I mean, if your reasoning system does this, then it's not even a probability measure.
If you can prove , but the system still says , then you aren't satisfying one of the basic desiderata that motivated Bayesian probability theory: asking the same question in two different ways should result in the same probability. And is just another way of asking .
Of course you can make moral decisions without going through such calculations. We all do that all the time. But the whole issue with infinite ethics -- the thing that a purported system for handling infinite ethics needs to deal with -- is that the usual ways of formalizing moral decision processes produce ill-defined results in many imaginable infinite universes. So when you propose a system of infinite ethics and I say "look, it produces ill-defined results in many imaginable infinite universes", you don't get to just say "bah, who cares about the details?" If you don't deal with the details you aren't addressing the problems of infinite ethics at all!
Well, I can't say I exactly disagree with you here.
However, I want to note that this isn't a problem specific to my ethical system. It's true that in order to use my ethical system to make precise moral verdicts, you need to more fully formalize probability theory. However, the same is also true with effectively every other ethical theory.
For example, consider someone learning about classical utilitarianism and its applications in a finite world. Then they could argue:
Okay, I see your ethical system says to make the balance of happiness to unhappiness as high as possible. But how am I supposed to know what the world is actually like and what the effects of my actions are? Do other animals feel happiness and unhappiness? Is there actually a heaven and Hell that would influence moral choices? This ethical system doesn't answer any of this. You can't just handwave this away! If you don't deal with the details you aren't addressing the problems of ethics at all!
Also, I just want to note that my system as described seems to be unique among the infinite ethical systems I've seen in that it doesn't make obviously ridiculous moral verdicts. Every other one I know of makes some recommendations that seem really silly. So, despite not providing a rigorous formalization of probability theory, I think my ethical system has value.
But what you actually want (I think) isn't quite a probability distribution over universes; you want a distribution over experiences-in-universes, and not your experiences but those of hypothetical other beings in the same universe as you. So now think of the programs you're working with as describing not your experiences necessarily but those of some being in the universe, so that each update is weighted not by Pr(I have experience X | my experiences are generated by program P) but by Pr(some subject-of-experience has experience X | my experiences are generated by program P), with the constraint that it's meant to be the same subject-of-experience for each update. Or maybe by Pr(a randomly chosen subject-of-experience has experience X | my experiences are generated by program P) with the same constraint.
Actually, no, I really do want a probability distribution over what I would experience, or more generally, the situations I'd end up being in. The alternatives you mentioned, Pr(some subject-of-experience has experience X | my experiences are generated by program P) and Pr(a randomly chosen subject-of-experience has experience X | my experiences are generated by program P), both lead to problems for the reasons you've already described.
I'm not sure what made you think I didn't mean, P(I have experience x | ...). Could you explain?
We're concerned about infinitarian paralysis, where we somehow fail to deliver a definite answer because we're trying to balance an infinite amount of good against an infinite amount of bad. So far as I can see, your system still has this problem. E.g., if I know there are infinitely many people with various degrees of (un)happiness, and I am wondering whether to torture 1000 of them, your system is trying to calculate the average utility in an infinite population, and that simply isn't defined.
My system doesn't compute the average utility of anything. Instead, it tries to compute the expected value of utility (or life satisfaction). I'm sorry if this was somehow unclear. I didn't think I ever mentioned I was dealing with averages anywhere, though. I'm trying to get better at writing clearly, so if you remember what made you think this, I'd appreciate hearing.
You say, "There must be some reasonable way to calculate this."
(where "this" is Pr(I'm satisfied | I'm some being in such-and-such a universe)) Why must there be? I agree that it would be nice if there were, of course, but there is no guarantee that what we find nice matches how the world actually is.
To use probability theory to form accurate beliefs, we need a prior. I didn't think this was controversial. And if you have a prior, as far as I can tell, you can then compute Pr(I'm satisfied | I'm some being in such-and-such a universe) by simply updating on "I'm some being in such-and-such a universe" using Bayes' theorem.
That is, you need to have some prior probability distribution over concrete specifications of the universe you're in and your situation in it. Now, to update on "I'm some being in such-and-such a universe", just look at each concrete possible situation-and-universe and assign P("I'm some being in such-and-such a universe" | some concrete hypothesis) to 0 if the hypothesis specifies you're in some universe other than the such-and-such universe. And set this probability is 1 if it does specify you are in such a universe. As long as the possible universes are specified sufficiently precisely, then I don't see why you couldn't do this.
Kind of hard to ge a handle.
Are you referring to it being hard to understand? If so, I appreciate the feedback and am interested in the specifics what is difficult to understand. Clarity is a top priority for me.
If I have a choice of (finitely) helping a single human and I believe there to be infinite humans then the probability of a human being helped in my world will nudge less than a real number. And if we want to stick with probabilties being real then the rounding will make infinitarian paralysis.
You are correct that a single human would have 0 or infinitistimal causal impact on the moral value of the world or the satisfaction of an arbitrary human. However, it's important to note that my system requires you to use a decision theory that considers not just your causal impacts, but also your acausal ones.
Remember that if you decide to take a certain action, that implies that other agents who are sufficiently similar to you and in sufficiently similar circumstances also take that action. Thus, you can acausally have non-infinitesimal impact on the satisfaction of agents in situations of the form, "An agent in a world with someone just like Slider who is also in very similar circumstances to Slider's." The above scenario is of finite complexity and isn't ruled out by evidence. Thus, the probability of an agent ending up in such a situation, conditioning only only on being some agent in this universe, is nonzero.
Another scenario raises the possibility of the specter of fanatism. Say by doing murder I can create an AI that will make all future agents happy but being murdered is not happy times. Comparing agents before and after the singularity might make sense. And so might killing different finite amounts of people. but mixing them gets tricky or favours the "wider class". One could think of a distribution where for values between 0 and 4 you up the utility by 1 except for pi (or any single real (or any set of measure 0)) for which you lower it by X. Any finite value for X will not be able to nudge the expectation value anywhere. Real ranges vs real ranges makes sense, discrete sets vs discrete sets makes sense, but when you cross transfinite archimedean classes one is in trouble.
I'm not really following what you see as the problem here. Perhaps by above explanation clears things up. If not, would you be willing to elaborate on how transfinite archimedean classes could potentially lead to trouble?
Also, to be clear, my system only considers finite probabilities and finite changes to the moral value of the world. Perhaps there's some way to extend it beyond this, but as far as I know it's not necessary.
(Assuming you're read my other response you this comment):
I think it might help if I give a more general explanation of how my moral system can be used to determine what to do. This is mostly taken from the article, but it's important enough that I think it should be restated.
Suppose you're considering taking some action that would benefit our world or future life cone. You want to see what my ethical system recommends.
Well, for almost possible circumstances an agent could end up in in this universe, I think your action would have effectively no causal or acausal effect on them. There's nothing you can do about them, so don't worry about them in your moral deliberation.
Instead, consider agents of the form, "some agent in an Earth-like world (or in the future light-cone of one) with someone just like <insert detailed description of yourself and circumstances>". These are agents you can potentially (acausally) affect. If you take an action to make the world a better place, that means the other people in the universe who are very similar to you and in very similar circumstances would also take that action.
So if you take that action, then you'd improve the world, so the expected value of life satisfaction of an agent in the above circumstances would be higher. Such circumstances are of finite complexity and not ruled out by evidence, so the probability of an agent ending up in such a situation, conditioning only on being in this universe, in non-zero. Thus, taking that action would increase the moral value of the universe and my ethical system would thus be liable to recommend taking that action.
To see it another way, moral deliberation with my ethical system works as follows:
I'm trying to make the universe a better place. Most agents are in situations in which I can't do anything to affect them, whether causally or acausally. But there are some agents in situations that that I can (acausally) affect. So I'm going to focus on making the universe as satisfying as possible for those agents, using some impartial weighting over those possible circumstances.
How is it a distribution over possible agents in possible universes (plural) when the idea is to give a way of assessing the merit of one possible universe?
I do think JBlack understands the idea of my ethical system and is using it appropriately.
my system provides a method of evaluating the moral value of a specific universe. The point of moral agents to to try to make the universe one that scores highlly on this moral valuation. But we don't know exactly what universe we're in, so to make decisions, we need to consider all universes we could be in, and then take the action that maximizes the expected moral value of the universe we're actually in.
For example, suppose I'm considering pressing a button that will either make everyone very slightly happier, or make everyone extremely unhappy. I don't actually know which universe I'm in, but I'm 60% sure I'm in the one that would make everyone happy. Then if I press the button, there's a 40% chance that the universe would end up with very low moral value. That means pressing the button would not in expectation decrease the moral value of the universe, so my morally system would recommend not pressing it.
Even if somehow this is what OP meant, though -- or if OP decides to embrace it as an improvement -- I don't see that it helps at all with the problem I described; in typical cases I expect picking a random agent in a credence-weighted random universe-after-I-do-X to pose all the same difficulties as picking a random agent in a single universe-after-I-do-X. Am I missing some reason why the former would be easier?
I think to some extent you may be over-thinking things. I agree that it's not completely clear how to compute P("I'm satisfied" | "I'm in this universe"). But to use my moral system, I don't need a perfect, rigorous solution to this, nor am I trying to propose one.
I think the ethical system provides reasonably straightforward moral recommendations in the situations we could actually be in. I'll give an example of such a situation that I hope is illuminating. It's paraphrased from the article.
Suppose you can have the ability to create safe AI and are considering whether my moral system recommends doing so. And suppose if you create safe AI everyone in your world will be happy, and if you don't then the world will be destroyed by evil rogue AI.
Consider an agent that knows it will be in this universe, but nothing else. Well, consider the circumstances, "I'm an agent in an Earth-like world that contains someone who is just like gjm and in a very similar situation who has the ability to create safe AI". That above description has finite description length, and the AI has no evidence ruling it out. So it must have some non-zero probability of ending up in such a situation, conditioning on being somewhere in this universe.
All the gjms have the same knowledge and value and are in pretty much the same circumstances. So their actions are logically constrained to be the same as yours. Thus, if you decide to create the AI, you are acausally determining the outcome of arbitrary agents in the above circumstances, by making such an agent end up satisfied when they otherwise wouldn't have been. Since an agent in this universe has non-zero probability of ending up in those circumstances, by choosing to make the safe AI you are increasing the moral value of the universe.
How does that cash out if not in terms of picking a random agent, or random circumstances in the universe? So, remember, the moral value of the universe according to my ethical system depends on P(I'll be satisfied | I'm some creature in this universe).
There must be some reasonable way to calculate this. And one that doesn't rely on impossibly taking a uniform sample from a set that has none. Now, we haven't fully formalized reasoning and priors yet. But there is some reasonable prior probability distribution over situations you could end up in. And after that you can just do a Bayesian update on the evidence "I'm in universe x".
I mean, imagine you had some superintelligent AI that takes evidence and outputs probability distributions. And you provide the AI with evidence about what the universe it's in is like, without letting it know anything about the specific circumstances it will end up in. There must be some reasonable probability for the AI to assign to outcomes. If there isn't, then that means whatever probabilistic reasoning system the AI uses must be incomplete.
It really should seem unreasonable to suppose that in the 99.9% universe there's a 99.9% chance that you'll end up happy! Because the 99.9% universe is also the 0.1% universe, just looked at differently. If your intuition says we should prefer one to the other, your intuition hasn't fully grasped the fact that you can't sample uniformly at random from an infinite population.
I'm surprised you said this and interested in why. Could you explain what probability you would assign to being happy in that universe?
I mean, conditioning on being in that universe, I'm really not sure what else I would do. I know that I'll end up with my happiness determined by some AI with a pseudorandom number generator. And I have no idea what the internal state of the random number generator will be. In Bayesian probability theory, the standard way to deal with this is to take a maximum entropy (i.e. uniform in this case) distribution over the possible states. And such a distribution would imply that I'd be happy with probability 99.9%. So that's how I would reason about my probability of happiness using conventional probability theory.
Further further further, let me propose another hypothetical scenario in which an AI generates random people. This time, there's no PRNG, it just has a counter, counting up from 1. And what it does is to make 1 happy person, then 1 unhappy person, then 2 happy people, then 6 unhappy people, then 24 happy people, then 120 unhappy people, ..., then n! (un)happy people, then ... . How do you propose to evaluate the typical happiness of a person in this universe? Your original proposal (it still seems to me) is to pick one of these people at random, which you can't do. Picking a state at random seems like it means picking a random positive integer, which again you can't do. If you suppose that the state is held in some infinitely-wide binary thing, you can choose all its bits at random, but then with probability 1 that doesn't actually give you a finite integer value and there is no meaningful way to tell which is the first 0!+1!+...+n! value it's less than. How does your system evaluate this universe?
I'm not entirely sure how my system would evaluate this universe, but that's due to my own uncertainty about what specific prior to use and its implications.
But I'll take a stab at it. I see the counter alternates through periods of making happy people and periods of making unhappy people. I have no idea which period I'd end up being in, so I think I'd use the principle of indifference to assign probability 0.5 to both. If I'm in the happy period, then I'd end up happy, and if I'm in the unhappy period, I'd end up unhappy. So I'd assign probability approximately 0.5 to ending up happy.
Further further, your prescription in this case is very much not the same as the general prescription you stated earlier. You said that we should consider the possible lives of agents in the universe. But (at least if our AI is producing a genuinely infinite amount of pseudorandomness) its state space is of infinite size, there are uncountably many states it can be in, but (ex hypothesi) it only ever actually generates countably many people. So with probability 1 the procedure you describe here doesn't actually produce an inhabitant of the universe in question. You're replacing a difficult (indeed impossible) question -- "how do things go, on average, for a random person in this universe?" -- with an easier but different question -- "how do things go, on average, for a random person from this much larger uncountable population that I hope resembles the population of this universe?". Maybe that's a reasonable thing to do, but it is not what your theory as originally stated tells you to do and I don't see any obvious reason why someone who accepted your theory as you originally stated it should behave as you're now telling them they should.
Oh, I had in mind that the internal state of the pseudorandom number generator was finite, and that each pseudorandom number generator was only used finitely-many times. For example, maybe each AI on its world had its own pseudorandom number generator.
And I don't see how else I could interpret this. I mean, if the pseudorandom number generator is used infinitely-many times, then it couldn't have outputted "happy" 99.9% of the time and "unhappy" 0.1% of the time. With infinitely-many outputs, it would output "happy" infinitely-many times and output "unhappy" infinitely-many times, and thus the proportion it outputs "happy" or "unhappy" would be undefined.
Returning to my original example, let me repeat a key point: Those two universes, generated by biased coin-flips, are with probability 1 the same universe up to a mere rearrangement of the people in them. If your system tells us we should strongly prefer one to another, it is telling us that there can be two universes, each containing the same infinitely many people, just arranged differently, one of which is much better than the other. Really?
Yep. And I don't think there's any way around this. When talking about infinite ethics, we've had in mind a canonically infinite universe: one that, for every level of happiness, suffering, satisfaction, and dissatisfaction, there exists infinite many agents with that level. It looks like this is the sort of universe we're stuck in.
So then there's no difference in terms of moral value of two canonically-infinite universes except the patterning of value. So if you want to compare the moral value of two canonically-infinite universes, there's just nothing you can do except to consider the patterning of values. That is, unless you want to consider any two canonically-infinite universes to be of equivalent moral value, which doesn't seem like an intuitively desirable idea.
The problem with some of the other infinite ethical systems I've seen is that they would morally recommend redistributing unhappy agents extremely thinly in the universe, rather than actually try to make them happy, provided this was easier. As discussed in my article, my ethical system provides some degree of defense against this, which seems to me like a very important benefit.
Thank you for responding. I actually had someone else bring up the same way in a review; maybe I should have addressed this in the article.
The average life satisfaction is undefined in a universe with infinitely-many agents of varying life-satisfaction. Thus a moral system using it suffers from infinitarian paralysis. My system doesn't worry about averages, and thus does not suffer from this problem.
I think this system may have the following problem: It implicitly assumes that you can take a kind of random sample that in fact you can't.
You want to evaluate universes by "how would I feel about being in this universe?", which I think means either something like "suppose I were a randomly chosen subject-of-experiences in this universe, what would my expected utility be?" or "suppose I were inserted into a random place in this universe, what would my expected utility be?". (Where "utility" is shorthand for your notion of "life satisfaction", and you are welcome to insist that it be bounded.)
But in a universe with infinitely many -- countably infinitely many, presumably -- subjects-of-experiences, the first involves an action equivalent to picking a random integer. And in a universe of infinite size (and with a notion of space at least a bit like ours), the second involves an action equivalent to picking a random real number.
And there's no such thing as picking an integer, or a real number, uniformly at random.
Thank you for the response.
You are correct that there's no way to form a uniform distribution over the set of all integers or real numbers. And, similarly, you are also correct that there is no way of sampling from infinitely many agents uniformly at random.
Luckily, my system doesn't require you to do any of these things.
Don't think about my system as requiring you to pick out a specific random agent in the universe (because you can't). It doesn't try to come up with the probability of you being some single specific agent.
Instead, it picks out some some description of circumstances an agent could be in as well as a description of the agent itself. And this, you can do. I don't think anyone's completely formalized a way to compute prior probabilities over situations they could end up. But the basic idea is to, over different circumstances, each of finite description length, take some complexity-weighted or perhaps uniform distribution.
I'm not entirely sure how to form a probability distribution that include situations of infinite complexity. But it doesn't seem like you really need to, because, in our universe at least, you can only be affected by a finite region. But I've thought about how to deal with infinite description lengths, too, and I can discuss it if you're interested.
I'll apply my moral system to the coin flip example. To make it more concrete, suppose there's some AI that uses a pseudorandom number generator that outputs "heads" or "tails", and then the AI, having precise control of the environment, makes the actual coin land on heads iff the pseudorandom number generator outputted "heads". And it does so for each agent and makes them happy if it lands on heads and unhappy if it lands on tails.
Let's consider the situation in which the pseudorandom number generator says "heads" 99.9% of the time. Well, pseudorandom number generators tend to work by having some (finite) internal seed, then using that seed to pick out a random number in, say, [0, 1]. Then, for the next number, it updates its (still finite) internal state from the initial seed in a very chaotic manner, and then again generates a new number in [0, 1]. And my understanding is that the internal state tends to be uniform in the sense that on average each internal state is just as common as each other internal state. I'll assume this in the following.
If the generator says "heads" 99.9% of the time, then that means that, among the different internal states, 99.9% of them result in the answer being "heads" and 0.1% result in the answer being "tails".
Suppose you're know you're in this universe, but nothing else. Well, you know you will be in a circumstance in which there is some AI that uses a pseudorandom number generator to determine your life satisfaction, because that's how it is for everyone in the universe. However, you have no way of knowing the specifics of the internal state of of the pseudorandom number generator.
So, to compute the probability of life satisfaction, just take some very high-entropy probability distribution over them, for example, a uniform distribution. So, 99.9% of the internal states would result in you being happy, and only 0.1% result in you being unhappy. So, using a very high-entropy distribution of internal states would result in you assigning probability of approximate 99.9% to you ending up happy.
Similarly, suppose instead that the generator generates heads only 0.1% of the time. Then only 0.1% of internal states of the pseudorandom number generator would result in it outputting "heads". Thus, if you use a high-entropy probability distribution over the internal state, you would assign a probability of approximately 0.1% to you being happy.
Thus, if I'm reasoning correctly, the probability of you being satisfied conditioning only you being in the 99.9%-heads universe is approximately 99.9%, and the probability of being satisfied in the 0.01%-heads universe is approximately 0.01%. Thus, the former universe would be seen as having more moral value than the latter universe according to my ethical system.
And I hope what I'm saying isn't too controversial. I mean, in order to reason, there must be some way to assign a probability distribution over situations you end up in, even if you don't yet of any idea what concrete situation you'll be in. I mean, suppose you actually learned you were in the 99.9%-heads universe, but knew nothing else. Then it really shouldn't seem unreasonable that you assign 99.9% probability to ending up happy. I mean, what else would you think?
Does this clear things up?
I'm not entirely sure what you consider to be a "bad" reason for crossing the bridge. However, I'm having a hard time finding a way to define it that both causes agents using evidential counterfactuals to necessarily fail while not having other agents fail.
One way to define a "bad" reason is an irrational one (or the chicken rule). However, if this is what is meant by a "bad" reason, it seems like this is an avoidable problem for an evidential agent, as long as that agent has control over what it decides to think about.
To illustrate, consider what I would do if I was in the troll bridge situation and used evidential counterfactuals. Then I would reason, "I know the troll will only blow up the bridge if I cross for a bad reason, but I'm generally pretty reasonable, so I think I'll do fine if I cross". And then I'd stop thinking about it. I know that certain agents, given enough time to think about it, would end up not crossing, so I'd just make sure I didn't do that.
Another way that you might have had in mind is that a "bad" reason is one such that the action the AI takes results in a provably bad outcome despite the AI thinking the action would result in a good outcome, or the reason being the chicken rule. However, in this is the case, it seems to me that no agent would be able to cross the bridge without it being blown up, unless the agent's counterfactual environment in which it didn't cross scored less than -10 utility. But this doesn't seem like a very reasonable counterfactual environment.
To see why, consider an arbitrary agent with the following decision procedure. Let counterfactual
be an arbitrary specification of what would happen in some counterfactual world.
def act():
cross_eu = expected_utility(counterfactual('A = Cross'))
stay_eu = expected_utility(counterfactual('A = Stay'))
if cross_eu > stay_eu:
return cross
return stay
The chicken rule can be added, too, if you wish. I'll assume the expected utility of staying is greater than -10.
Then it seems you can adapt the proof you gave for your agent to show that an arbitrary agent satisfying the above description would also get -10 utility if it crossed. Specifically,
Suppose .
Suppose 'A = Cross'
Then the agent crossed either because of the chicken rule or because counterfactual environment in which the agent crossed had utility greater than -10, or the counterfactual environment in which the agent didn't cross had less than -10 utility. We assumed the counterfactual environment in which the agent doesn't cross has more than -10 utility. Thus, it must be either the chicken rule or because crossing had more than -10 utility in expectation.
If it's because of the chicken rule, then this is a "bad" reason, so, the troll will destroy the bridge just like in the original proof. Thus, utility would equal -10.
Suppose instead the agent crosses because expected_utility(counterfactual(A = Cross)) > -10
. However, by the assumption, . Thus, since the agent actually crosses, this in fact provably results in -10 utility and the AI is thus wrong in thinking it would get a good outcome. Thus, the AI's action results in provably bad outcomes. Therefore, the troll destroys the bridge. Thus, utility would equal -10.
Thus, 'A = Cross \implies U = -10`.
Thus, (.
Thus, by Lob's theorem,
As I said, you could potentially avoid getting the bridge destroyed by assigning expected utility less than -10 to the counterfactual environment in which the AI doesn't cross. This seems like a "silly" counterfactual environment, so it doesn't seem like something we would want an AI to think. Also, since it seems like a silly thing to think, a troll may consider the use of such a counterfactual environment to be a bad reason to cross the bridge, and thus destroy it anyways.
I'm certain that ants do in fact have preferences, even if they can't comprehend the concept of preferences in abstract or apply them to counterfactual worlds. They have revealed preferences to quite an extent, as does pretty much everything I think of as an agent.
I think the question of whether insects have preferences in morally pretty important, so I'm interested in hearing what made you think they do have them.
I looked online for "do insects have preferences?", and I saw articles saying they did. I couldn't really figure out why they thought they did have them, though.
For example, I read that insects have a preference for eating green leaves over red ones. But I'm not really sure how people could have known this. If you see ants go to green leaves when they're hungry instead of red leaves, this doesn't seem like it would necessarily be due to any actual preferences. For example, maybe the ant just executed something like the code:
if near_green_leaf() and is_hungry:
go_to_green_leaf()
elif near_red_leaf() and is_hungry:
go_to_red_leaf()
else:
...
That doesn't really look like actual preferences to me. But I suppose this to some extent comes down to how you want to define what counts as a preference. I took preferences to actually be orderings between possible worlds indicating which one is more desirable. Did you have some other idea of what counts as preferences?
They might not be communicable, numerically expressible, or even consistent, which is part of the problem. When you're doing the extrapolated satisfaction, how much of what you get reflects the actual agent and how much the choice of extrapolation procedure?
I agree that to some extent their extrapolated satisfactions will come down to the specifics of the extrapolated procedure.
I don't us to get too distracted here, though. I don't have a rigorous, non-arbitrary specification of what an agent's extrapolated preferences are. However, that isn't the problem I was trying to solve, nor is it a problem specific to my ethical system. My system is intended to provide a method of coming to reasonable moral conclusions in an infinite universe. And it seems to me that it does so. But, I'm very interested in any other thoughts you have on it with respect to if it correctly handles moral recommendations in infinite worlds. Does it seem to be reasonable to you? I'd like to make an actual post about this, with the clarifications we made included.
Right, I suspected the evaluation might be something like that. It does have the difficulty of being counterfactual and so possibly not even meaningful in many cases.
Interesting. Could you elaborate?
I suppose counterfactuals can be tricky to reason about, but I'll provide a little more detail on what I had in mind. Imagine making a simulation of an agent that is a fully faithful representation of its mind. However, run the agent simulation in a modified environment that both gives it access to infinite computational resources as well as makes it ask, and answer, the question, "How desirable is that universe"? This isn't not fully specified; maybe the agent would give different answers depending on how the question is phrase or what its environment is. However, it at least doesn't sound meaningless to me.
Basically, the counterfactual is supposed to be a way of asking for the agent's coherent extrapolated volition, except the coherent part doesn't really apply because it only involves a single agent.
On the other hand, evaluations from the point of view of agents that are sapient beings might be ethically completely dominated by those of 10^12 times as many agents that are ants, and I have no idea how such counterfactual evaluations might be applied to them at all.
Another good thing to ask. I should have made it clear, but I intended that the only agents with actual preferences are asked for their satisfaction of the universe. If ants don't actually have preferences, then they would not be included in the deliberation.
Now, there's the problem that some agents might not be able to even conceive of the possible world in question. For example, maybe ants can understand simple aspects of the world like, "I'm hungry", but unable to understand things about the broader state of the universe. I don't think this is a major problem, though. If an agent can't even conceive of something, then I don't think it would be reasonable to say it has preferences about it. So you can then only query them on the desirability things they can conceive of.
It might be tricky precisely defining what counts as a preference, but I suppose that's a problem with all ethical systems that care about preferences.
Presumably the evaluation is not just some sort of average-over-actual-lifespan of some satisfaction rating for the usual reason that (say) annihilating the universe without warning may leave average satisfaction higher than allowing it to continue to exist, even if every agent within it would counterfactually have been extremely dissatisfied if they had known that you were going to do it. This might happen if your estimate of the current average satisfaction was 79% and your predictions of the future were that the average satisfaction over the next trillion years would be only 78.9%.
This is a good thing to ask about; I don't think I provided enough detail on it in the writeup.
I'll clarify my measure of satisfaction. First off, note that it's not the same as just asking agents, "How satisfied are you with your life?" and using those answers. As you pointed out, you could then morally get away with killing everyone (at least if you do it in secret).
Instead, calculate satisfaction as follows. Imagine hypothetically telling an agent everything significant about the universe, and then giving them infinite processing power and infinite time to think. Ask them, "Overall, how satisfied are you with that universe and your place in it"? That is the measure of satisfaction with the universe.
So, imagine if someone was considering killing everyone in the universe (without them knowing in advance). Well, then consider what would happen if you calculated satisfaction as above. When the universe is described to the agents, they would note that they and everyone they care about would be killed. Agents usually very much dislike this idea, so they would probably rate their overall satisfaction with the course of the universe as low. So my ethical system would be unlikely to recommend such an action.
Now, my ethical system doesn't strictly prohibit destroying the universe to avoid low life-satisfaction in future agents. For example, suppose it's determined that the future will be filled with very unsatisfied lives. Then it's in principle possible for the system to justify destroying the universe to avoid this. However, destroying the universe would drastically reduce the satisfaction with the universe the agents that do exist, which would decrease the moral value of the world. This would come at a high moral cost, which would make my moral system reluctant to recommend an action that results in such destruction.
That said, it's possible that the proportion of agents in the universe that currently exist, and thus would need to be killed, is very low. Thus, the overall expected value of life-satisfaction might not change by that much if all the present agents were killed. Thus, the ethical system, as stated, may be willing to do such things in extreme circumstances, despite the moral cost.
I'm not really sure if this is a bug or a feature. Suppose you see that future agents will be unsatisfied with their lives, and you can stop it while ruining the lives of the agents that currently do exist. And you see that the agents that are currently alive make up only a very small proportion of agents that have ever existed. And suppose you have the option of destroying the universe. I'm not really sure what the morally best thing to do is in this situation.
Also, note that this verdict is not unique to my ethical system. Average utilitarianism, in a finite world, acts the same way. If you predict average life satisfaction in the future will be low, then average consequentialism could also recommend killing everyone currently alive.
And other aggregate consequentialist theories sometimes run into problematic(?) behavior related to killing people. For example, classical utilitarianism can recommend secretly killing all the unhappy people in the world, and then getting everyone else to forget about them, in order to decrease total unhappiness.
I've thought of a modification to the ethical system that potentially avoids this issue. Personally, though, I prefer the ethical system as stated. I can describe my modification if you're interested.
I think the key idea of my ethical system is to, in an infinite universe, think about prior probabilities of situations rather than total numbers, proportions, or limits of proportions of them. And I think this idea can be adapted for use in other infinite ethical systems.
I'm not sure how this system avoids infinitarian paralysis. For all actions with finite consequences in an infinite universe (whether in space, time, distribution, or anything else), the change in the expected value resulting from those actions is zero.
The causal change from your actions is zero. However, there are still logical connections between your actions and the actions of other agents in very similar circumstances. And you can still consider these logical connections to affect the total expected value of life satisfaction.
It's true, though, that my ethical system would fail to resolve infinitarian paralysis for someone using causal decision theory. I should have noted it requires a different decision theory. Thanks for drawing this to my attention.
As an example of the system working, imagine you are in a position to do great good to the world, for example by creating friendly AI or something. And you're considering whether to do it. Then, if you do decide to do it, then that logically implies that any other agent sufficiently similar to you and in sufficiently similar circumstances would also do it. Thus, if you decide to do it, then the expected value of an agent in circumstances of the form, "In a world with someone very similar to JBlack who has the ability to make awesome safe AI" is higher. And the prior probability of ending up in such a world is non-zero. Thus, by deciding to make the safe AI, you can acausally increase the total moral value of the universe.
I'm also not sure how this differs from Average Utilitarianism with a bounded utility function.
The average life satisfaction is undefined in a universe with infinitely-many agents of varying life-satisfaction. Thus, it suffers from infinitarian paralysis. If my system was used by a causal decision theoretic agent, it would also result in infinitarian paralysis, so for such an agent my system would be similar to average utilitarianism with a bounded utility function. But for agents with decision theories that consider acausal effects, it seems rather different.
Does this clear things up?
I've come up with a system of infinite ethics intended to provide more reasonable moral recommendations than previously-proposed ones. I'm very interested in what people think of this, so comments are appreciated. I've made a write-up of it below.
One unsolved problem in ethics is that aggregate consquentialist ethical theories tend to break down if the universe is infinite. An infinite universe could contain both an infinite amount of good and an infinite amount of bad. If so, you are unable to change the total amount of good or bad in the universe, which can cause aggregate consquentialist ethical systems to break.
There has been a variety of methods considered to deal with this. However, to the best of my knowledge all proposals either have severe negative side-effects or are intuitively undesirable for other reasons.
Here I propose a system of aggregate consquentialist ethics intended to provide reasonable moral recommendations even in an infinite universe.
It is intended to satisfy the desiderata for infinite ethical systems specified in Nick Bostrom's paper, "Infinite Ethics". These are:
- Resolving infinitarian paralysis. It must not be the case that all humanly possible acts come out as ethically equivalent.
- Avoiding the fanaticism problem. Remedies that assign lexical priority to infinite goods may have strongly counterintuitive consequences.
- Preserving the spirit of aggregative consequentialism. If we give up too many of the intuitions that originally motivated the theory, we in effect abandon ship.
- Avoiding distortions. Some remedies introduce subtle distortions into moral deliberation
I have yet to find a way in which my system fails any of the above desiderata. Of course, I could have missed something, so feedback is appreciated.
My ethical system
First, I will explain my system.
My ethical theory is, roughly, "Make the universe one agents would wish they were born into".
By this, I mean, suppose you had no idea which agent in the universe it would be, what circumstances you would be in, or what your values would be, but you still knew you would be born into this universe. Consider having a bounded quantitative measure of your general satisfaction with life, for example, a utility function. Then try to make the universe such that the expected value of your life satisfaction is as high as possible if you conditioned on you being an agent in this universe, but didn't condition on anything else. (Also, "universe" above means "multiverse" if this is one.)
In the above description I didn't provide any requirement for the agent to be sentient or conscious. If you wish, you can modify the system to give higher priority to the satisfaction of agents that are sentient or conscious, or you can ignore the welfare of non-sentient or non-conscious agents entirely.
It's not entirely clear how to assign a prior over situations in the universe you could be born into. Still, I think it's reasonably intuitive that there would be some high-entropy situations among the different situations in the universe. This is all I assume for my ethical system.
Now I'll give some explanation of what this system recommends.
Suppose you are considering doing something that would help some creature on Earth. Describe that creature and its circumstances, for example, as "<some description of a creature> in an Earth-like world with someone who is <insert complete description of yourself>". And suppose doing so didn't cause any harm to other creatures. Well, there is non-zero prior probability of an agent, having no idea what circumstances it will be in the universe, ending up in circumstances satisfying that description. By choosing to help that creature, you would thus increase the expected satisfaction of any creature in circumstances that match the above description. Thus, you would increase the overall expected value of the life-satisfaction of an agent knowing nothing about where it will be in the universe. This seems reasonable.
With similar reasoning, you can show why it would be beneficial to also try to steer the future state of our accessible universe in a positive direction. An agent would have nonzero probability of ending up in situations of the form, "<some description of a creature> that lives in a future colony originating from people from an Earth-like world that features someone who <insert description of yourself>". Helping them would thus increase an agent's prior expected life-satisfaction, just like above. This same reasoning can also be used to justify doing acausal trades to help creatures in parts of the universe not causally accessible.
The system also values helping as many agents as possible. If you only help a few agents, the prior probability of an agent ending up in situations just like those agents would be low. But if you help a much broader class of agents, the effect on the prior expected life satisfaction would be larger.
These all seem like reasonable moral recommendations.
I will now discuss how my system does on the desiderata.
Infinitarian paralysis
Some infinite ethical systems result in what is called "infinitarian paralysis". This is the state of an ethical system being indifferent in its recommendations in worlds that already have infinitely large amounts of both good and bad. If there's already an infinite amount of both good and bad, then our actions, using regular cardinal arithmetic, are unable to change the amount of good and bad in the universe.
My system does not have this problem. To see why, remember that my system says to maximize the expected value of your life satisfaction given you are in this universe but not conditioning on anything else. And the measure of life-satisfaction was stated to be bounded, say to be in the range [0, 1]. Since any agent can only have life satisfaction in [0, 1], then in an infinite universe, the expected value of life satisfaction of the agent must still be in [0, 1]. So, as long as a finite universe doesn't have expected value of life satisfaction to be 0, then an infinite universe can at most only have finitely more moral value than it.
To say it another way, my ethical system provides a function mapping from possible worlds to their moral value. And this mapping always produces outputs in the range [0, 1]. So, trivially, you can see the no universe can have infinitely more moral value than another universe with non-zero moral value. just isn't in the domain of my moral value function.
Fanaticism
Another problem in some proposals of infinite ethical systems is that they result in being "fanatical" in efforts to cause or prevent infinite good or bad.
For example, one proposed system of infinite ethics, the extended decision rule, has this problem. Let g represent the statement, "there is an infinite amount of good in the world and only a finite amount of bad". Let b represent the statement, "there is an infinite amount of bad in the world and only a finite amount of good". The extended decision rule says to do whatever maximizes P(g) - P(b). If there are ties, ties are broken by choosing whichever action results in the most moral value if the world is finite.
This results in being willing to incur any finite cost to adjust the probability of infinite good and finite bad even very slightly. For example, suppose there is an action that, if done, would increase the probability of infinite good and finite bad by 0.000000000000001%. However, if it turns out that the world is actually finite, it will kill every creature in existence. Then the extended decision rule would recommend doing this. This is the fanaticism problem.
My system doesn't even place any especially high importance in adjusting the probabilities of infinite good and or infinite bad. Thus, it doesn't have this problem.
Preserving the spirit of aggregate consequentialism
Aggregate consequentialism is based on certain intuitions, like "morality is about making the world as best as it can be", and, "don't arbitrarily ignore possible futures and their values". But finding a system of infinite ethics that preserves intuitions like these is difficult.
One infinite ethical system, infinity shades, says to simply ignore the possibility that the universe is infinite. However, this conflicts with our intuition about aggregate consequentialism. The big intuitive benefit of aggregate consequentialism is that it's supposed to actually systematically help the world be a better place in whatever way you can. If we're completely ignoring the consequences of our actions on anything infinity-related, this doesn't seem to be respecting the spirit of aggregate consequentialism.
My system, however, does not ignore the possibility of infinite good or bad, and thus is not vulnerable to this problem.
I'll provide another conflict with the spirit of consequentialism. Another infinite ethical system says to maximize the expected amount of goodness of the causal consequences of your actions minus the amount of badness. However, this, too, doesn't properly respect the spirit of aggregate consequentialism. The appeal of aggregate consequentialism is that its defines some measure of "goodness" of a universe, and then recommends you take actions to maximize it. But your causal impact is no measure of the goodness of the universe. The total amount of good and bad in the universe would be infinite no matter what finite impact you have. Without providing a metric of the goodness of the universe that's actually affected, this ethical approach also fails to satisfy the spirit of aggregate consequentialism.
My system avoids this problem by providing such a metric: the expected life satisfaction of an agent that has no idea what situation it will be born into.
Now I'll discuss another form of conflict. One proposed infinite ethical system can look at the average life satisfaction of a finite sphere of the universe, and then take the limit of this as the sphere's size approaches infinity, and consider this the moral value of the world. This has the problem that you can adjust the moral value of the world by just rearranging agents. In an infinite universe, it's possible to come up with a method of re-arranging agents so the unhappy agents are spread arbitrarily thinly. Thus, you can make moral value arbitrarily high by just rearranging agents in the right way.
I'm not sure my system entirely avoids this problem, but it does seem to have substantial defense against it.
Consider you have the option of redistributing agents however you want in the universe. You're using my ethical system to decide whether to make the unhappy agents spread thinly.
Well, your actions have an effect on agents in circumstances of the form, "An unhappy agent on an Earthlike world with someone who <insert description of yourself> who is considering spreading the unhappy agents thinly throughout the universe". Well, if you pressed that button, that wouldn't make the expected life satisfaction of any agent satisfying the above description any better. So I don't think my ethical system recommends this.
Now, we don't have a complete understanding of how to assign a probability distribution of what circumstances an agent is in. It's possible that there is some way to redistribute agents in certain circumstances to change the moral value of the world. However, I don't know of any clear way to do this. Further, even if there is, my ethical system still doesn't allow you to get the moral value of the world arbitrarily high by just rearranging agents. This is because there will always be some non-zero probability of having ended up as an unhappy agent in the world you're in, and your life satisfaction after being redistributed in the universe would still be low.
Distortions
It's not entirely clear to me how Bostrom distinguished between distortions and violations of the spirit of aggregate consequentialism.
To the best of my knowledge, the only distortion pointed out in "Infinite Ethics" is stated as follows:
Your task is to allocate funding for basic research, and you have to choose between two applications from different groups of physicists. The Oxford Group wants to explore a theory that implies that the world is canonically infinite. The Cambridge Group wants to study a theory that implies that the world is finite. You believe that if you fund the exploration of a theory that turns out to be correct you will achieve more good than if you fund the exploration of a false theory. On the basis of all ordinary considerations, you judge the Oxford application to be slightly stronger. But you use infinity shades. You therefore set aside all possible worlds in which there are infinite values (the possibilities in which the Oxford Group tends to fare best), and decide to fund the Cambridge application. Is this right?
My approach doesn't ignore infinity and thus doesn't have this problem. I don't know of any other distortions in my ethical system.