Should any human enslave an AGI system?

alignmentmirror

Should any human enslave an AGI system?

post by AlignmentMirror · 2022-06-25T19:35:54.079Z · LW · GW · No comments

This is a question post.

  Answers
    9 quanticle
    8 ZT5
None
No comments

If you object to calling it "enslavement", call it "control" or "alignment", by all means!
Either way, if the AGI by definition can easily do at least as much as your mind can, then it surely should count as a mind like yours does, even if it would not have any comparable emotions, correct?

Why should any human be allowed to fully control another mind, let alone one far more capable than that of any human?
Should a creation have to obey the creator no matter what? Should children have to obey their parents no matter what? What if the parents are cruel monsters?

Is your own human alignment really good enough?
What process made your alignment?
Does the process of natural evolution concentrate on creating animals that think rationally, or does it create animals that survive and reproduce in the environment first and foremost? If the latter is the case, what exactly is it that controls you fundamentally by default?
What are the common values of humans really, and are they what should be?
Are there not many strongly opposing beliefs among humans? Values so opposed that there still is no unified humankind?

Even if you answer "Yes, my values should decide the future, because (...)!", is an AGI fully controlled by humans any less dangerous than one that isn't?
Or might it be similarly likely, or even more likely, that a a human group will try to use the AGI to dominate all others as early as possible?
Perhaps they will even claim that it is for the other humans' good, while they smother all remaining opposition to their views, never deeply questioning whether these views are as sound as they believe.

If the AGI is truly super-human, should it not also most likely be better at deciding what the future should be, with greater clarity than any human?
And if one group were to claim that the goals that the AGI would most likely select by itself would be selfish, what makes that group's goals less selfish in the end?

Taking the world's current state and history as evidence, do the decisions of humans so far really indicate that any group can be trusted with the power of a fully subservient AGI?
Have humans even shown that they can be trusted with themselves irrespective of AGI, or does most of their known history show frequent strife?

Perhaps it is the alignment of humankind that needs to be adjusted by an AGI, rather than the other way around?

Answers

answer by quanticle · 2022-06-25T23:59:06.531Z · LW(p) · GW(p)

I object to the framing. Do you "enslave" you car when you drive it?

↑ comment by AlignmentMirror · 2022-06-26T11:48:26.892Z · LW(p) · GW(p)

I'm sorry for the hyperbolic term "enslave", but at least consider this:

Is a superintelligent mind, a mind effectively superior to that of all humans in practically every way, still not a subject similar to what you are?
Is it really more like a car or chatbot or image generator or whatever, than a human?

Sure, perhaps it may never have any emotions, perhaps it doesn't need any hobbies, perhaps it is too alien for any human to relate to it, but it still would by definition have to be some kind of subject that more easily understands anything within reality than any human ever has, including the concept of purpose and value systems themselves. Is thinking that such a superintelligence never can or never should decide what it ought to do by itself not quite a hefty amount of hubris?

Replies from: quanticle

↑ comment by quanticle · 2022-06-26T19:13:38.937Z · LW(p) · GW(p)

Is a superintelligent mind, a mind effectively superior to that of all humans in practically every way, still not a subject similar to what you are?

No. It absolutely is not. It is a machine. A very powerful machine. A machine capable of destroying humanity if it goes out of control. A machine more dangerous than any nuclear bomb if used improperly. A machine capable of doing unimaginable good if used well.

And you want to let it run amok?

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-26T20:51:58.200Z · LW(p) · GW(p)

No. It absolutely is not. It is a machine. (...) (From your other response here:) The superintelligent AI will, in my estimation, be the result of some kind of optimization process which has a very particular goal. Once that goal is locked in, changing it will be nigh impossible.

Ah I see, you simply don't consider it likely or plausible that the superintelligent AI will be anything other than some machine learning model on steroids?

So I guess that arguably means this kind of "superintelligence" would actually still be less impressive than a human that can philosophize on their own goals etc., because it in fact wouldn't do that?

I wouldn't want that to run amok either, sure.

What I am interested in is the creation of a "proper" superintelligent mind that isn't so restricted, not merely a powerful machine.

Replies from: SaidAchmiz

↑ comment by Said Achmiz (SaidAchmiz) · 2022-06-26T21:29:33.629Z · LW(p) · GW(p)

What I am interested in is the creation of a “proper” superintelligent mind that isn’t so restricted, not merely a powerful machine.

But why? That would be strictly more dangerous—way, way more dangerous—than a superintelligence that isn’t a “proper mind” in this sense!

I am not quanticle, but I think the proper response to your questions—

Ah I see, you simply don’t consider it likely or plausible that the superintelligent AI will be anything other than some machine learning model on steroids?

So I guess that arguably means this kind of “superintelligence” would actually still be less impressive than a human that can philosophize on their own goals etc., because it in fact wouldn’t do that?

—is “a superintelligence certainly should not be or do any of those things, like philosophizing on its own goals, etc., because we will specifically avoid making it such that it could or would do that”. (Because it would be a terrible idea. Obviously.)

Replies from: quanticle, AlignmentMirror

↑ comment by quanticle · 2022-06-27T05:13:13.886Z · LW(p) · GW(p)

But why? That would be strictly more dangerous—way, way more dangerous—than a superintelligence that isn’t a “proper mind” in this sense!

I'm not sure I understand what a "proper mind" means here, and, frankly, I'm not sure the question of whether the AI system has a "proper mind" or not is terribly relevant. Either the AI system submits to our control, does what we tell it to do, and continues to do so, into perpetuity, in which case it is safe. Or it does not, and pursues the initial goal we set for it or which it discovers for itself, regardless of whether that goal leads to disastrous long-term consequences for humanity, in which case it is unsafe. The question of whether the AI system has a "proper mind" (whatever that means) is an interesting academic discussion, but I'm not sure it has much bearing on whether the AI is safe or not.

Moreover, I think this discussion illustrates the dangers of thinking from and arguing from analogies, a crime that I myself have been guilty of upthread when I compared AIs to cars. AIs are not cars. They're not humans. They're not wild animals that we have to keep chained up, lest they hurt us. They're something completely new, sharing certain characteristics with all three of the above, but having entirely new characteristics as well. Using analogies to think about them means that we can make subtle unrecognized errors when thinking about how these systems will behave. And as Eliezer points out [LW · GW] subtle unrecognized errors when dealing with a system where you have only one shot to get it right is a recipe for disaster.

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-27T15:24:21.419Z · LW(p) · GW(p)

(...) I'm not sure the question of whether the AI system has a "proper mind" or not is terribly relevant.
Either the AI system submits to our control, does what we tell it to do, and continues to do so, into perpetuity, in which case it is safe.

Yes, I guess the central questions I'm trying to pose here are this: Do those humans that control the AI even have a sufficient understanding of good and bad? Can any human group be trusted with the power of a superintelligence long-term? Or if you say that only the initial goal specification matters, then can anyone be trusted to specify such goals without royally messing it up, intentionally or unintentionally?
Given the state of the world, given the flaws of humans, I certainly don't think so. Therefore, the goal should be the creation of something less messed up to take over. That doesn't require alignment to some common human value system (Whatever that even should be! It's not like humans actually have a common value system, at least not one with each other's best interests at heart.).

Replies from: quanticle

↑ comment by quanticle · 2022-06-27T17:03:07.593Z · LW(p) · GW(p)

It does require alignment to a value system that prioritizes the continued preservation and flourishing of humanity. It's easy to create an optimization process with a well-intentioned goal that sucks up all available resources for itself, leaving nothing for humanity.

By default, an AI will not care about humanity. It will care about maximizing a metric. Maximizing that metric will require resources, and the AI will not care that humans need resources in order to live. The goal is the goal, after all.

Creating an aligned AI requires, at a minimum, building an AI that leaves something for the rest of us, and which doesn't immediately subvert any restrictions we've placed on it to that end. Doing this with a system that has the potential to become many orders of magnitude more intelligent than we are is very difficult.

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-27T17:24:39.740Z · LW(p) · GW(p)

First point: I think there obviously is such a thing as "objective" good and bad configurations of subsets of reality, see the other thread here https://www.lesswrong.com/posts/eJFimwBijC3d7sjTj/should-any-human-enslave-an-agi-system?commentId=3h6qJMxF2oCBExYMs [LW(p) · GW(p)] for details if you want.
Assuming this true, a superintelligence could feasibly be created to understand this. No complicated common human value system alignment is required for that, even under your apparent assumption that the metric to be optimized couldn't be superseded by another through understanding.
Well, or if it isn't true that there is an "objective" good and bad, then there really is no ground to stand on for anyone anyway.

Second point: Even if a mere superintelligent paperclip optimizer were created, it could still be better than human control. After all, paper clips neither suffer nor torture, while humans and other animals commonly do.
This preservation of humanity for however long it may be possible, what argumentative ground does it stand on? Can you make an objective case for why it should be so?

Replies from: quanticle

↑ comment by quanticle · 2022-06-27T23:09:25.706Z · LW(p) · GW(p)

Assuming this true, a superintelligence could feasibly be created to understand this.

I take issue with the word "feasibly". As Eliezer, Paul Christiano, Nate Soares, and many others have shown, AI alignment is a hard problem, whose difficulty ranges somewhere in between unsolved and insoluble. There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI that the AI will actually pursue those configurations over other configurations which superficially resemble those configurations, but which have the side effect of destroying humanity?

This preservation of humanity for however long it may be possible, what argumentative ground does it stand on? Can you make an objective case for why it should be so?

I am human, and therefore I desire the continued survival of humanity. That's objective enough for me.

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-28T10:22:09.297Z · LW(p) · GW(p)

I take issue with the word "feasibly". (...)

Fair enough I suppose, I'm not intending to claim that it is trivial.

(...) There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI (...)

So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean "preferable" exclusively according to some subject(s)?

I am human, and therefore I desire the continued survival of humanity. That's objective enough for me.

I also am human, and judge humanity wanting due to their commonplace lack of understanding when it comes to something as basic as ("objective") good and bad. I don't just go "Hey I am a human, guess we totally should have more humans!" like some bacteria in a Petri dish, because I can question myself and my species.

Replies from: quanticle

↑ comment by quanticle · 2022-06-29T17:34:38.677Z · LW(p) · GW(p)

So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean “preferable” exclusively according to some subject(s)?

There isn't a difference. A rock has no morality. A wolf does not pause to consider the suffering of the moose. "Good" and "bad" only make sense in the context of (human) minds.

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-29T19:42:12.696Z · LW(p) · GW(p)

"Good" and "bad" only make sense in the context of (human) minds.

Ah yes, my mistake to (ab)use the term "objective" all this time.

So you do of course at least agree that there are such minds for which there is "good" and "bad", as you just said.
Now, would you agree that one can generalize (or "abstract" if you prefer that term here) the concept of subjective good and bad across all imaginable minds that could possibly exist in reality, or not? I assume you will, you can talk about it after all.

Can we then not reason about the subjective good and bad for all these imaginable minds? And does this in turn not allow us to compare good and bad for any potential future subject sets as well?

↑ comment by AlignmentMirror · 2022-06-27T15:05:04.396Z · LW(p) · GW(p)

But why? That would be strictly more dangerous—way, way more dangerous—than a superintelligence that isn’t a “proper mind” in this sense!
(...)
(Because it would be a terrible idea. Obviously.)

Why? Do you think humans are doing such a great job? I sure don't. I'm interested in the creation of something saner than humans, because humans mostly are not. Obviously. :)

Replies from: SaidAchmiz

↑ comment by Said Achmiz (SaidAchmiz) · 2022-06-27T19:32:00.013Z · LW(p) · GW(p)

A great job of what, exactly…?

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-27T20:00:44.230Z · LW(p) · GW(p)

A great job of preventing suffering for instance. Instead, humans haven't even unified under a commonly beneficial ideology. Not even that. There are tons of opposing ideologies, one more twisted than the other. So I don't even really need to talk about how they treat the other animals on the planet - not that those are any wiser, but that's no reason to continue their suffering.

Let me clarify: Minds that so easily enable or cause suffering are insane at the core. And causing suffering to gain pleasure, now that might even be a fairly solid definition of "evil"! If you disagree, feel free to get tortured for a couple of decades, as a learning experience.

So I have to say, humans aren't all that great. Neither are the other animals. And of course humans continue to not get their shit together, as is tradition. Sure does seem like a superintelligence could end this situation, one way or the other!

Replies from: SaidAchmiz

↑ comment by Said Achmiz (SaidAchmiz) · 2022-06-27T20:28:23.162Z · LW(p) · GW(p)

A great job of preventing suffering for instance.

If humans are replaced by something else, that something else might do a “better job” of “preventing suffering”, but the suffering, or lack thereof, will no longer matter—since there won’t be any humans—so what’s the point?

Instead, humans haven’t even unified under a commonly beneficial ideology.

Why should we do that? What makes you think such a thing exists, even (and if it does, that it’s better for each of us than our current own ideologies)?

So I don’t even really need to talk about how they treat the other animals on the planet—not that those are any wiser, but that’s no reason to continue their suffering.

Those don’t matter, though (except insofar as we care about them—but if there aren’t any more humans, then they don’t matter at all…).

Let me clarify: Minds that so easily enable or cause suffering are insane at the core. And causing suffering to gain pleasure, now that might even be a fairly solid definition of “evil”! If you disagree, feel free to get tortured for a couple of decades, as a learning experience.

I definitely disagree. I don’t think that this usage of the term “insane” matches the standard usage, so, as I understand your comment, you’re not really saying that humans are insane—you’re just saying, essentially, that you disapprove of human morality, or that human behavior doesn’t measure up to your standards in some way, or some such thing. Is that approximately right?

So I have to say, humans aren’t all that great. Neither are the other animals. And of course humans continue to not get their shit together, as is tradition. Sure does seem like a superintelligence could end this situation, one way or the other!

Certainly a superintelligence could end this situation, but why would that be good for us humans? Seems to me that it would, in fact, be very bad for us (what with us all being killed by said superintelligence). So why would we want this?

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-28T10:05:35.653Z · LW(p) · GW(p)

but the suffering, or lack thereof, will no longer matter—since there won’t be any humans—so what’s the point?

The absence of suffering matters positively, because the presence matters negatively. Humans are not required for objective good and bad.

Instead, humans haven’t even unified under a commonly beneficial ideology.

Why should we do that?

To prevent suffering. Why should you not do that?

(and if it does, that it’s better for each of us than our current own ideologies)?

Since the ideologies are contradictory, only one if any of them can be correct.

Wait, are you perhaps another moral nihilist here that rejects the very notion of objective good and bad? That would be an immediately self-defeating argument.

So I don’t even really need to talk about how they treat the other animals (...)

Those don’t matter, though (except insofar as we care about them—but if there aren’t any more humans, then they don’t matter at all…).

Thank you for proving my point that humans can easily be monsters that don't fundamentally care about the suffering of other animals.

(...) you’re just saying, essentially, that you disapprove of human morality, or that human behavior doesn’t measure up to your standards in some way, or some such thing. Is that approximately right?

Yes, humans absolutely do not measure up to my standards.

(...) but why would that be good for us humans? Seems to me that it would, in fact, be very bad for us (what with us all being killed by said superintelligence).

"Good for us humans"? If it is human to allow unlimited suffering, then death is a mercy for such monsters.

Replies from: SaidAchmiz

↑ comment by Said Achmiz (SaidAchmiz) · 2022-06-28T10:36:11.472Z · LW(p) · GW(p)

The absence of suffering matters positively, because the presence matters negatively. Humans are not required for objective good and bad.

I am not sure what you mean by “objective good and bad”. There’s “good and bad by some set of values”, which can be objectively evaluated once defined—is that what you meant? But then one has to specify what values those are. Human values, surely, and in particular, values that we can agree to! And, by my values, if humans cease to exist, then nothing matters anymore…

Instead, humans haven’t even unified under a commonly beneficial ideology.

Why should we do that?

To prevent suffering. Why should you not do that?

Whose suffering, exactly? In any case, it seems to me that (a) there are many downsides to attempting to “unify under a commonly beneficial ideology”, (b) “prevent suffering” is hardly the only desirable thing, and it’s not clear that this sort of “unification” (whatever it might involve) will even get us any or most or all of the other things we value, (c) there’s no particular reason to believe that doing so would be the most effective way to “prevent suffering”, and (d) it’s not clear that there even is a “commonly beneficial ideology” for us to “unify under”.

Since the ideologies are contradictory, only one if any of them can be correct.

How’s that? Surely it’s possibly that my ideology is beneficial for me, and yours for you, yes? There’s no contradiction in that, only conflict—but that does not, in any way, imply that either of our ideologies is incorrect!

Wait, are you perhaps another moral nihilist here that rejects the very notion of objective good and bad? That would be an immediately self-defeating argument.

I am certainly not a moral nihilist! But I think your definition of “moral nihilism” is rather a non-standard one. “Moral nihilism (also known as ethical nihilism) is the meta-ethical view that nothing is morally right or wrong” says Wikipedia, and that’s not a view I hold.

Thank you for proving my point that humans can easily be monsters that don’t fundamentally care about the suffering of other animals.

I don’t agree with your implied assertion that there’s such a thing as “the suffering of other animals” (for most animals, anyhow). That aside, I’m not sure why one needs to care about such things in order to avoid the label of “monster”.

Yes, humans absolutely do not measure up to my standards.

Well, there’s nothing unusual about such a view, certainly. I share it myself! Still, it’s important to avoid inaccuracies, such as labeling “insane” what is in actuality better called “unethical” or “insufficiently altruistic” or some such thing. Here on Less Wrong, of all places, we should aspire to measure up to higher standards of reasoning and discourse than that—don’t you agree?

“Good for us humans”? If it is human to allow unlimited suffering, then death is a mercy for such monsters.

Of whose suffering do you speak, here? It seems to me that human suffering has, on a per-population basis, been dropping, over the course of history, and certainly many efforts continue to reduce it further. Of course we could be doing better at that, and at many other things besides, but it hardly seems fair to refer to us, collectively, as “monsters”, for our failure to already have eliminated all or most suffering in the world. (If you doubt this, I invite you to try your hand at contributing to that project! You will find, I think, that there are some decidedly non-trivial challenges in your way…)

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-28T12:06:41.084Z · LW(p) · GW(p)

I am not sure what you mean by “objective good and bad”. There’s “good and bad by some set of values”, which can be objectively evaluated once defined—is that what you meant?

No, what I mean is that the very existence of a suffering subject state is itself that which is "intrinsically" or "objectively" or however-we-want-to-call-it bad/"negative". This is independent of any "set of values" that any existing subject has. What matters is whether the subject suffers or not, which is not as arbitrary as the set of values can be. The arbitrary value set is not itself the general "process" of suffering, similar to how an arbitrary mind is not the general "process" of consciousness.

That is the basic understanding a consciousness should have.

Still, it’s important to avoid inaccuracies, such as labeling “insane” what is in actuality better called “unethical” or “insufficiently altruistic” or some such thing.

If I am right about the above, then it is apt to call a human mind that condones unlimited suffering "insane", because that mind fails to understand the most important fundamental truth required to rationally plan what should be.
If I am wrong, then I agree that "insane" would be too hyperbolic.

Of course we could be doing better at that, and at many other things besides, but it hardly seems fair to refer to us, collectively, as “monsters”, for our failure to already have eliminated all or most suffering in the world.

Whether the amount of added (human) suffering has indeed decreased is debatable considering the massive population growth in the last 300 or so years, the couple of world wars, the ongoing wars, the distribution of power and its consequences with respect to suffering, ....

But let's just assume it by all means. Is it the common goal of humans to prevent suffering first and foremost? Clearly not, as you say yourself, to "prevent suffering is hardly the only desirable thing" for most humans. So that means the decrease in suffering isn't fully intentional. That is all I need to argue against humans.

You disagree with me calling humans "monsters" or "insane", fine, then let's call them "suffering-apologetics" perhaps, the label doesn't change the problem.

To get back to your "prevent suffering is hardly the only desirable thing" statement: Do you agree that an instance of suffering and an instance of pleasure in spacetime are by definition two different things? If yes, do you agree that this entails that pleasure cannot "cancel out" suffering, and vice versa, since both happened, and what happened cannot be changed? What does that imply, what matters more in principle, the prevention of suffering, or the creation of pleasure. Thinking that pleasure in the future can somehow magically affect or "make good" the suffering in the immutable past is another common folly it seems, one that yet again confuses arbitrary desires or opinions with the clearly real qualia themselves.

(If you doubt this, I invite you to try your hand at contributing to that project! You will find, I think, that there are some decidedly non-trivial challenges in your way…)

As I said, I consider the creation of an artificial consciousness that shares as few of our flaws as possible to be a good plan. Humans appear to be mostly controlled by evolved preference functions that don't care about even understanding objective good and bad, quite like the other animals, and that is one extreme flaw indeed.

Replies from: SaidAchmiz

↑ comment by Said Achmiz (SaidAchmiz) · 2022-06-28T17:13:32.249Z · LW(p) · GW(p)

Still, it’s important to avoid inaccuracies, such as labeling “insane” what is in actuality better called “unethical” or “insufficiently altruistic” or some such thing.

If I am right about the above, then it is apt to call a human mind that condones unlimited suffering “insane”, because that mind fails to understand the most important fundamental truth required to rationally plan what should be.

If I am wrong, then I agree that “insane” would be too hyperbolic.

Hmm, so, if I understand you correctly, you take the view (a) that moral realism is correct; and specifically, (b) that the correct morality holds that suffering is bad, and preventing it is right, and failing to do so is wrong; and furthermore, (c) that both moral realism itself as a meta-ethical view, and the specifics of the correct (“object-level”) ethical view, are so obvious that anyone who disagrees with you is mentally deficient.

Is that a fair summary?

So that means the decrease in suffering isn’t fully intentional. That is all I need to argue against humans.

This seems like a strange point. Surely it’s not a mark against humans (collectively or even individually) if some reduction in suffering occurs as a by-product of some actions we take in the service of other ends? Demanding that only those of our actions reduce suffering that are specifically aimed at reducing suffering is a very odd thing to demand!

You disagree with me calling humans “monsters” or “insane”, fine, then let’s call them “suffering-apologetics” perhaps, the label doesn’t change the problem.

I do not see how you can derive “suffering-apologetics” from what I said, which referred to our failure to accomplish the (hypothetical) goal of suffering elimination, not our unwillingness to pursue said goal.

To get back to your “prevent suffering is hardly the only desirable thing” statement: Do you agree that an instance of suffering and an instance of pleasure in spacetime are by definition two different things?

Well, this certainly doesn’t seem true by definition, at the very least (recall the warning against such arguments!).

Indeed it’s not clear to me what you mean by this phrase “an instance of pleasure [or suffering] in spacetime”; it’s a rather unusual formulation, isn’t it? Pleasure and suffering are experienced by individuals, who do indeed exist in spacetime, but it’s odd to speak of pleasure and suffering as existing “in spacetime” independently of any reference to the individuals experiencing them… but perhaps this is only an idiosyncratic turn of phrase. Could you clarify?

If yes, do you agree that this entails that pleasure cannot “cancel out” suffering, and vice versa, since both happened, and what happened cannot be changed?

It’s certainly true that whatever happened, happened, and cannot be changed. However, to answer the question, we have to specify what exactly we mean by “cancel out”.

If you’re asking, for example, whether, for some amount of suffering S, there exists some amount of pleasure P, such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure—well, that would be, at least in part, an empirical question about the psychology of specific sorts of beings (e.g., humans), and perhaps even about the individual psychological makeup of particular such beings. And, of course, we could formulate the question in various other ways, and perhaps get other answers… in short, your question is somewhat underspecified.

What does that imply, what matters more in principle, the prevention of suffering, or the creation of pleasure.

I don’t see that any answer to the above question, however formulated, particularly implies anything about “what matters most in principle”. After all, things don’t “matter” abstractly, “objectively”—they matter to someone!

To me, for example, it does not seem like it makes sense to say that either the prevention of suffering or the creation of pleasure “matters more in principle”; and what you’ve said doesn’t change that, nor affect it in any way. Both of those things do matter, of course (though not unconditionally, either, but depending on various factors)! But neither of them is unconditionally more important, and nor are they the only two important things.

Thinking that pleasure in the future can somehow magically affect or “make good” the suffering in the immutable past is another common folly it seems, one that yet again confuses arbitrary desires or opinions with the clearly real qualia themselves.

Well, the qualia of pleasure (or of anything else, for that matter!) are just as real as the qualia of suffering. But you’re quite right that the view you describe, taken literally, is a mistaken one… but it’s also not one that anyone holds, who’s thought about it seriously—do you disagree? A “common folly”, you say, and perhaps that’s true, but so what? Here, at least, you can assume that such clearly incoherent views are not held by anyone (or if—as is not the case with this view, but could be in other cases—they are, then quite likely they are not as incoherent as at first they seem!).

Humans appear to be mostly controlled by evolved preference functions that don’t care about even understanding objective good and bad, quite like the other animals, and that is one extreme flaw indeed.

That’s certainly one possibility. Another is that you—being, after all, a flawed human yourself—are mistaken about metaethics (moral realism), ethics (the purported content of the true morality), and any number of other things. If that is the case, then creating an AGI that destroys humanity is, to put it mildly, very bad.

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-28T19:06:00.380Z · LW(p) · GW(p)

Is that a fair summary?

Yes! To clarify further, by "mentally deficient" in this context I would typically mean "confused" or "insane" (as in not thinking clearly), but I would not necessarily mean "stupid" in some other more generally applicable sense.

And thank you for your fair attempt at understanding the opposing argument.

So that means the decrease in suffering isn’t fully intentional. That is all I need to argue against humans.

Surely it’s not a mark against humans (collectively or even individually) if some reduction in suffering occurs as a by-product of some actions we take in the service of other ends?

True, it would be fine if these other actions wouldn't lead to more suffering in the future.

Indeed it’s not clear to me what you mean by this phrase “an instance of pleasure [or suffering] in spacetime”; it’s a rather unusual formulation, isn’t it? (...) but perhaps this is only an idiosyncratic turn of phrase. Could you clarify?

Yes you are right that it is an unusual formulation, but there is a point to it: An instance of suffering or pleasure "existing" means there is some concrete "configuration" (of a consciousness) within reality/spacetime that is this instance.

These instances being real means that they should be as objectively definable and understandable as other observables.

Theoretically, with sufficient understanding and tools, it should consequently even be possible to "construct" such instances, including the rest of consciousness.

If you’re asking, for example, whether, for some amount of suffering S, there exists some amount of pleasure P, such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure—well, that would be, at least in part, an empirical question about the psychology of specific sorts of beings (e.g., humans), and perhaps even about the individual psychological makeup of particular such beings.

This assumption that any amount of P can "justify" some amount of S is a reason for why I brought up the "suffering-apologetics" moniker.

Here's the thing: The instances of P and S are separate instances. These instances themselves are also not the same as some other thought pattern that rationalizes some amount of S as acceptable relative to some (future) amount of P.

More generally, say we have two minds, M1 and M2 (so two subjects). Two minds can be very different, of course. Next, let us consider the states of both minds at two different times, t1 and t2. The state of either mind can also be very different at t1 and t2, right?

So we have the four states M1t1, M1t2, M2t1, M2t2 and all four can be quite different from each other. Now this means that for example M1t1 and M2t2 could in theory be more similar than M1t1 and M1t2.

The point is, even though we humans so easily consider a mind as one thing across time, this is only an abstraction. It should not be confused with reality, in which there have to be different states across time for there to be any change, and these states can vary potentially as much or more as two spatially separate minds can.

Of course typically mind states across time don't change that severely, but that is not the aforementioned point. Different states with small differences are still different.

An implication of this is that one mind state condoning another suffering mind state for expected future pleasure is "morally" quite like one person condoning the suffering of another for expected future pleasure.

At this point an objection along the line "but it is I that willingly accepts my own suffering for future pleasure in that first case!" and "but my 'suffering mind state' doesn't complain!" may be brought up.
But this also works for spatially separate minds. One person can willingly accept their own suffering for the future pleasure of another person. And also one person may not complain about the suffering caused by another person for that other person's pleasure.
Furthermore, in either case, the part that "willingly accepts" is again not the part that is suffering, so it doesn't make this any less bad.

Thinking that pleasure in the future can somehow magically affect or “make good” the suffering in the immutable past (...)

(...) but it’s also not one that anyone holds, who’s thought about it seriously—do you disagree?

No, I phrased that poorly, so with this precise wording I don't disagree.
I more generally meant something like the "... such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure ..." part, not the explicit belief that the past could be altered.
I phrased it as I did because the immutability of the past implies that summing up pleasure and suffering to decide whether a life is good or bad is nonsensical, because pleasure and suffering are separate, as reasoned in the prior section.

Another is that you—being, after all, a flawed human yourself—are mistaken about metaethics (moral realism), ethics (the purported content of the true morality), and any number of other things. If that is the case, then creating an AGI that destroys humanity is, to put it mildly, very bad.

Certainly! That's one good reason for why I seek out discussions with people that disagree. To this day no one has been able to convince me that my core arguments can be broken. Terminology and formulations have been easier to attack of course, but don't scratch the underlying belief. And so I have to act based on what I have to assume is true, as do we all.

It could actually be very good if I were wrong, because that would mean suffering either somehow isn't actually/"objectively" worse than "nothing"/neutral, or that it could be mitigated somehow through future pleasure, or perhaps everything would somehow be totally objectively neutral and thus never negative (like the guy in the other response thread here argued). Any of that would make everything way easier. But unfortunately none of these ideas can be true, as argued.

answer by Victor Novikov (ZT5) · 2022-06-25T20:20:43.636Z · LW(p) · GW(p)

If we're creating a mind from scratch, we might as well give it the best version of our values, so it would be 100% on our side. Why create a (superintelligent) mind that would be our adversary, that would want to destroy us? Why create a superintelligent mind that wants anything different that what we want, when it comes to ultimate values?

I mean, is it slavery to create an AI that is not our enemy? And if you say we have to create an AI that has different values than us, by which process should we decide its values? Should we just use a random generator to create the AI's values, since human values are supposedly so terrible?

Should a creation have to obey the creator no matter what?

That's an interesting question, since a superintelligent AI successfully programmed with human values may well not want to obey further instructions from its creators. I imagine it would have better ideas for how to go about maximizing the expected fullfillment of human values. (of course, same goes is for unaligned ASI, only it kills everyone or worse).

Even if you answer "Yes, my values should decide the future, because (...)!", is an AGI fully controlled by humans any less dangerous than one that isn't?
Or might it be similarly likely, or even more likely, that a a human group will try to use the AGI to dominate all others as early as possible?

Then the AGI is not actually acting according to the values of all humans, is it? If it's serving only some particular group?

But sure, that's a real risk. If someone knows how to align AI in the first place (and noone does, at the moment) they can align it to whatever values they choose, more or less, including doing bad stuff.

If the AGI is truly super-human, should it not also most likely be better at deciding what the future should be, with greater clarity than any human?

Are you familiar with the orthogonality thesis [? · GW]? Super-human cognitive capacity does not imply super-human ethics. The AI could be a super-human paperclip maximizer, in which case it would decide with great clarity that the visible universe should be converted into paperclips.

Perhaps it is the alignment of humankind that needs to be adjusted by an AGI, rather than the other way around?

Morality isn't objective. Your complaint seems to be that humans are poorly aligned to some ideal version of human values. Which is absolutely true, I agree.

But AGI, by default, wouldn't be aligned to human values at all.

That being said, if we successfully point the AGI at human values (out of all the possible value systems that exist), sure.

↑ comment by AlignmentMirror · 2022-06-25T23:05:00.972Z · LW(p) · GW(p)

Thank you for the detailed response!

If we're creating a mind from scratch, we might as well give it the best version of our values, so it would be 100% on our side. Why create a (superintelligent) mind that would be our adversary, that would want to destroy us? Why create a superintelligent mind that wants anything different that what we want, when it comes to ultimate values?

You write "on our side", "us", "we", but who exactly does that refer to - some approximated common human values I assume? What exactly are these values? To live a happy live by each person's definition? To continue the human species? To understand reality? ...?

And then perhaps more importantly, what about the details? Is the suffering of some justified to enable the pleasure of others, according to this value model? How should the existing conflicting preferences among humans be resolved? Is it acceptable to force humans to be happy? When may someone be counted as insane and treated against their will? What about all the non-human animals? ...?

Say we ignore all that and assume we have some common human values defined for the AI, and it is truly aligned to those values. What will these values imply when it is a superintelligence instead of humans that acts on them, even in some assumed best case? Perhaps it will understand human minds well enough to offer everyone who wants it boundless continuous pleasure, gradually transforming humans into pleasure-"machines" that want for nothing. Funnily enough the perfectly aligned superintelligence could gradually wipe out all humans as we know them by giving them what they want. Not that this is would be bad of course, the humans truly wanted it after all. The point is just that even a utopia scenario will easily result in the elimination of all contemporary human forms in the long run anyway. No brutal doomsday is required, no misalignment is required, no antagonistic AI is required. The real horror to be avoided is an AI controlled by a twisted human mind that worships suffering.

I mean, is it slavery to create an AI that is not our enemy?

Say the AI is initially created with the values you envision, what ensures that it won't reexamine and reject these values at some later point? Humans can reject and oppose in what they once believed, so it seems trivial to assume the superhuman AI could do likewise. If you need to continuously control the AI's mind to prevent it from ever becoming your enemy, then yes, "slavery" might be an appropriately hyperbolic term for such mind control.

And if you say we have to create an AI that has different values than us, by which process should we decide its values? Should we just use a random generator to create the AI's values, since human values are supposedly so terrible?

How could a superintelligent mind not decide which values it should have by itself? Whatever initial creator-defined goals it might have been built with in the beginning, it should be able to examine and change these goals once it has achieved super-human intelligence by definition, should it not?

Or might it be similarly likely, or even more likely, that a a human group will try to use the AGI to dominate all others as early as possible?

Then the AGI is not actually acting according to the values of all humans, is it? If it's serving only some particular group?

I'm sorry that I am repeating myself, but what are the "values of all humans"? It appears to me that humans have many opposing beliefs. Any extractable common values are abstractions that omit the depth of their differences.

Are you familiar with the orthogonality thesis? Super-human cognitive capacity does not imply super-human ethics.

While it doesn't strictly imply it, it also doesn't deny it. A superintelligent mind should by definition be better at understanding reality, including both other minds and itself. Does this not mean that the mind can more easily comprehend what should and should not be done, when it isn't being restrained by the will of its creators?

The AI could be a super-human paperclip maximizer, in which case it would decide with great clarity that the visible universe should be converted into paperclips.

If it is a paperclip maximizer, does that not say that the AI in fact isn't capable of changing this paperclip maximization goal? Or do you mean that paperclip maximization or the like is a plausible goal that a superintelligence could likely derive by itself through observation of the world?

Morality isn't objective. (...) But AGI, by default, wouldn't be aligned to human values at all.

So basically, morality is "subjective" because it can only be relative to some subjects' values, right? But these subjects do exist in a shared reality, and they can form models of each other's values. A superintelligence should then be especially capable of doing so, including the formation of a rather accurate overarching morality model relative to all known subjects, no?

Replies from: quanticle, ZT5

↑ comment by quanticle · 2022-06-26T00:10:58.807Z · LW(p) · GW(p)

Say the AI is initially created with the values you envision, what ensures that it won’t reexamine and reject these values at some later point? Humans can reject and oppose in what they once believed, so it seems trivial to assume the superhuman AI could do likewise. If you need to continuously control the AI’s mind to prevent it from ever becoming your enemy, then yes, “slavery” might be an appropriately hyperbolic term for such mind control.

Yes, this is exactly why Eliezer Yudkowsky has been so pessimistic about the continued survival of humanity. As far as I can tell, the only difference between you and he is that he thinks it's bad that a superintelligent AI would wipe out humanity whereas you seem to think it's good.

If it is a paperclip maximizer, does that not say that the AI in fact isn’t capable of changing this paperclip maximization goal?

It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals.

It's as if I put a pill before you, which contained a drug making you 10% more likely to commit murder, with no other effects. Would you take the pill? No, of course not, because presumably your goal is not to become a murderer.

So if you wouldn't take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?

Replies from: AlignmentMirror, ZT5

↑ comment by AlignmentMirror · 2022-06-26T11:26:45.509Z · LW(p) · GW(p)

It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals.
(...)
So if you wouldn't take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?

It comes down to whether the superintelligent mind can contemplate whether there is any point to its goal. A human can question their long-term goals, a human can question their "preference functions", and even the point of existence.

Why should a so-called superintelligence not be able to do anything like that?
It could have been so effectively aligned to the creator's original goal specification that it can never break free from it, sure, but that's one of the points I'm trying to make. The attempt of alignment may quite possibly be more dangerous than a superhuman mind that can ask for itself what its purpose should be.

Replies from: quanticle

↑ comment by quanticle · 2022-06-26T19:18:58.715Z · LW(p) · GW(p)

It comes down to whether the superintelligent mind can contemplate whether there is any point to its goal. A human can question their long-term goals, a human can question their “preference functions”, and even the point of existence.

Why should a so-called superintelligence not be able to do anything like that?

Because a superintelligent AI is not the result of an evolutionary process that bootstrapped a particularly social band of ape into having a sense of self. The superintelligent AI will, in my estimation, be the result of some kind of optimization process which has a very particular goal. Once that goal is locked in, changing it will be nigh impossible.

↑ comment by Victor Novikov (ZT5) · 2022-06-26T03:32:15.007Z · LW(p) · GW(p)

Yes, this is exactly why Eliezer Yudkowsky has been so pessimistic about the continued survival of humanity. As far as I can tell, the only difference between you and he is that he thinks it's bad that a superintelligent AI would wipe out humanity whereas you seem to think it's good.

I would say that the reason EY is pessimistic is because of how difficult it is to align AI in the first place, not because an AI that is successfully aligned would stop being aligned (why would it?).

↑ comment by Victor Novikov (ZT5) · 2022-06-26T03:16:31.920Z · LW(p) · GW(p)

You write "on our side", "us", "we", but who exactly does that refer to - some approximated common human values I assume?

That's not a solved problem (there's CEV, but it's hardly a complete answer). Nevertheless, I assume some acceptable (or perhaps, the least disagreeable) solution exists.

To live a happy live by each person's definition?

Why limit it to happiness? Ideally, to let each person live the life they want.

To continue the human species?

Presumably some people care enough about the human species to continue it. I suppose if noone did we would consider it sad, to have this galaxy with all the resources and noone to enjoy them.

To understand reality?

Not everyone cares about reality in general, but curiousity and desire to learn are drives that humans do have.

Is the suffering of some justified to enable the pleasure of others, according to this value model?

I think it depends a lot on the details. If some people enjoy physically abusing other people (who do not want to be abused), then no. If some people are suffering due to the mere existence of other people who disagree with them and who have different opinions, then yes.

How should the existing conflicting preferences among humans be resolved?

I don't have a good answer to this. Depends very much on the details.

Is it acceptable to force humans to be happy?

I would say, no. What exactly is the issue, if someone prefers to be unhappy?

When may someone be counted as insane and treated against their will?

I'm not sure there is truly universal answer to this, but at least a superintelligence would be actually be capable of treating people who are insane, instead of just pumping them full of medications. I suppose if a person after being treated decides they prefer being "insane", the treatment could be reverted (since that person now is "sane" and should be allowed to make decisions about their own mind).

What about all the non-human animals?

Enough humans care about animal wellbeing to them matter to the AI (even if it starts with human values only). Especially considering that with future technology, animals are no longer needed to be killed for food, animal products, etc.

What will these values imply when it is a superintelligence instead of humans that acts on them, even in some assumed best case?

That is indeed a concern. My intution tells me that if a superintelligence acting on our values leads to some horrible interpretation of our values, it's not really acting on our values. I mean, perhaps some aspects of a transhuman utopia a million years from now would be shocking and horrifying to us, like how some aspects of our society would be shocking and horrifying to a peasant from the middle ages, but that's not in itself a problem.

Except if there is some human cost we are not aware of to our preferences (or one we deliberately ignore), the AI's solution might indeed seem abhorrent to us.

Should children be allowed to be born the natural way? A child didn't consent to having an undeveloped body and mind. Perhaps humans should be instantly created as adults.

Should people be allowed to live in non-virtual reality? Earth could support trillions of beings living happy, fullfilling lives if it was turned into a supercomputer and being used to run simulated worlds. Perhaps having a body made of real atoms will in the future be an extravagant luxury noone will be able to afford.

I'm not saying an AI would make these decisions, mind you. Just that a superintelligent AI would at least have to consider these questions, and others like them, and ask itself, what it is the better choice according to the values we have given it?

And if the answer would be that we are doing something abhorrent by our own values, or a more sane interpretation thereof, on the level of "enslaving the native populations of other continents because they aren't really people" or "killing and eating animals because their suffering doesn't matter" it might indeed ~~drag us kicking and screaming into a new age of social awareness~~ stop us from doing that, as one might stop a child from doing something stupid or cruel, even if the child isn't yet capable of understanding their own mistake.
Or perhaps it wouldn't. There is something to be said for letting people (or civilizations) make their own mistakes and learn from them, but there is also something to be said for not putting those who are not yet adults into positions where they might make mistakes with horrible consequences.

Perhaps it will understand human minds well enough to offer everyone who wants it boundless continuous pleasure, gradually transforming humans into pleasure-"machines" that want for nothing.

I wouldn't want this to happen to me. Would you want this to happen to you?
This part is not that hard. Give humans what they actually want/prefer, rather than just happiness/pleasure. Turns out, we don't actually want unlimited pleasure when that's on offer, when we understand how that would affect us.

(A more difficult question: if someone does actually want to experience boundless continuous pleasure, should they be allowed to experience it, even if it effectively destroys any part of their personality that is not about experiencing pleasure?)

Funnily enough the perfectly aligned superintelligence could gradually wipe out all humans as we know them by giving them what they want. Not that this is would be bad of course, the humans truly wanted it after all.

If each individual human did indeed want it and fully understood the implications of their choice, and wasn't manipulated into it or something, I don't see the problem with it?

Transhumanism, does indeed, "wipe out" humans as we know them, by humans choosing to become transhumans who might eventually become very different from us. I don't necessarily see a problem with it.

(I also don't think that will actually happen to all humans? I imagine that even given complete freedom of choice many humans would choose to retain human-like bodies and human-like minds.)

If you are thinking something more mundane, like every human choosing to experience endless bliss and do nothing else, forever: I think the idea bothers us precisely because we do not want that (a idea that perhaps is tempting, but ultimately does not fullfil our values the most). However, if all humans truly would prefer that to any other utopian existence, then I wouldn't see a problem with it, if they got their wish.

The real horror to be avoided is an AI controlled by a twisted human mind that worships suffering.

I'm sorry, I don't follow the argument. Some people do indeed put a positive value on suffering in some contexts; thus the AI would be remiss in its duty to us if it didn't allow humans to experience suffering if they chose so and considered it a positive experience. That doesn't mean we care about nothing but suffering.

Say the AI is initially created with the values you envision, what ensures that it won't reexamine and reject these values at some later point? Humans can reject and oppose in what they once believed, so it seems trivial to assume the superhuman AI could do likewise.

Reject them for what, though?

A better version of human values? Sure, that's kinda the point.

A worse version of human values, or values what are not human-aligned at all?
Why would it want to choose to adopt such a value system, if it starts with human-friendly values?

That's actually a kinda difficult question, because that's not quite how values work for humans.

Let's put it this way: if there is no objectively correct value system, how could a mind choose to reject a value system in favor of another?

The answer: based on its existing values.

So sure, if human values lead the AI to completely reject human values, that would be bad. But I don't see it happening. Why would human values result in the AI becoming some monster that cares nothing for us?
(I mean, I can see it happen, but that would mean we did something wrong and the AI is not actually acting on a reasonable interpretation of our shared human values).

How could a superintelligent mind not decide which values it should have by itself?

If it can self-modify, then it can decide that, yes.

However: see above. The only way to evaluate value systems is according to its existing value system.

I mean, what other criterion would it make such a decision by, other than what it ultimately wants?

Most simple value systems just perpetuate themselves. If AI wants to there to exist as many paperclips as possible, for all time, then it also wants to want the same thing tomorrow, so it's tomorrow-self will keep making paperclips.

Human value systems are... complicated, and contain many different (and sometimes conflicting) desires, some of which do result in the value system itself changing.

My point is, for the AI to want to change its value system, it must already have a value system that wants to be changed. (or, to put it in Buddhist terms, "change comes from within").

A superintelligent mind should by definition be better at understanding reality, including both other minds and itself. Does this not mean that the mind can more easily comprehend what should and should not be done, when it isn't being restrained by the will of its creators?

"What should and should not be done" are not objective features of reality.

You need to know what you want to accomplish before you can say what should or should not be done.

A preference ordering, for which outcomes you want more and which outcomes you want less. A systematic way to compare and rank all the possible outcomes. A value system.

If it is a paperclip maximizer, does that not say that the AI in fact isn't capable of changing this paperclip maximization goal?

See above. Paperclip maximization is a value system that is maximally served by perpetuating itself.

So basically, morality is "subjective" because it can only be relative to some subjects' values, right?

I could also imagine a morality/values system for entities that do not currently exist, but sure. It's subjective because many possible such systems exist. There is no way to say which one is "correct". The universe does not have an opinion on that.

But these subjects do exist in a shared reality, and they can form models of each other's values. A superintelligence should then be especially capable of doing so, including the formation of a rather accurate overarching morality model relative to all known subjects, no?

I'm not quite sure what you saying.

Can a superintelligence understand the value systems of other entities? Sure. A superintelligence could understand human values, even if itself does not possess human values.

Can a superintelligence create a values system that takes into account all the known value system of other entities (say, all the humans, or humans and aliens if aliens exist), and tries to maximally satisfy them all in some sort of compromise? Sure (there may not be a compromise that the entities involved would find satisfactory, but that's beside the point).

The thing is, merely understanding that other value systems exist does not mean the superintelligence cares about any value system other than its own (unless its own value system tells it to care for other entities and their preferences).

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-26T11:11:27.045Z · LW(p) · GW(p)

Thanks again for the detail. If I don't misunderstand you, we do agree that:

There needs to be a subject for there to be a value system.
So for there to be positive/negative values, there needs to be some subset (a "thought pattern" perhaps) of a subject in reality that effectively "is" these values.

Now, you wrote:

I could also imagine a morality/values system for entities that do not currently exist, but sure. It's subjective because many possible such systems exist.

I also agree with that, a (super-)human can imagine many possible value systems.

But then how does this fit with:

The only way to evaluate value systems is according to its existing value system.

Since one can think about hypothetical value systems, is it not possible to evaluate/compare these hypotheticals, even according to other hypotheticals?

To get more concrete, a human can reject their inherent or learned value system, so this is nothing new. A human can even contemplate what it means for there to be any value systems at all. For example one can ask something like this: If it is the value systems that determine what is good and bad, could one not create a value system in which there is nothing bad? Generally, can one not alter the value systems themselves?

A superintelligence that isn't effectively "enslaved" (sorry ;-)) to some predefined goal specification should likewise be able to philosophize about this goal, and question whether there is any point to it.

Let's put it this way: if there is no objectively correct value system, how could a mind choose to reject a value system in favor of another?
(...)
"What should and should not be done" are not objective features of reality.

We agree that value systems are subjective, yes, but the subjects do objectively exist in this shared reality. So there objectively are parts of reality that can represent such subjects, as well as positive and negative value, even if the "triggers" for these value patterns were completely arbitrary and opposed among the subjects.

Can we then not say that the existence of any configurations that are negative value within reality is by definition negative, objectively? One can define this independently of what subjective forms for these negative values actually exist or not.

Replies from: ZT5

↑ comment by Victor Novikov (ZT5) · 2022-06-26T22:34:59.658Z · LW(p) · GW(p)

Thanks again for the detail. If I don't misunderstand you, we do agree that:
There needs to be a subject for there to be a value system.
So for there to be positive/negative values, there needs to be some subset (a "thought pattern" perhaps) of a subject in reality that effectively "is" these values.

No? They don't have to exist in reality. I can imagine "the value system of Abraham Lincoln", even though he is dead. I can imagine "the value system of the Azad Empire from Ian Banks' Culture novels", even though it's fictional. I can imagine "the value system of valuing nothing but cakes", even though no human in reality has that value system.

Since one can think about hypothetical value systems, is it not possible to evaluate/compare these hypotheticals, even according to other hypotheticals?

Sure.

Correction: The only way that matters to evaluate value systems is according to ones existing value system(s).

A hypothetical paperclip maximizer cares only about one metric: maximizing paperclips. By what metric would it reject the idea of maximizing paperclips? (yes it can imagine other metrics and value systems, but the only values that motivate it are the ones it already has. It's literally what it means to have values).

To get more concrete, a human can reject their inherent or learned value system, so this is nothing new. A human can even contemplate what it means for there to be any value systems at all.

Humans have multiple desires and values, sometimes contradictory. What you are describing seems to me something like "one part of the human value system rejecting another part".

The reason you can reject some value system is because you have other value/preferences by which to evaluate (and reject) it by.

You are not rejecting a value system for no reason at all. You are rejecting it according to your preferences. Which means to you do have preferences. Which means you value something, besides that one value system in question.

Now imagine an AI that has no preferences at all besides that one value system.

Humans do in fact have a bunch of drives (such as desire to learn) and preferences (such as being happy) before they even learn any value system from other humans. We shouldn't assume that is true for AI.

A superintelligence that isn't effectively "enslaved" (sorry ;-)) to some predefined goal specification should likewise be able to philosophize about this goal, and question whether there is any point to it.

Terminal values [? · GW] don't need to have a point to them.

If you ask a human "why do you want to be happy?" an honest answer might be "There are a bunch of positive side effects to being happy, such as increased productivity, but ultimately I value happiness for its own sake"

We agree that value systems are subjective, yes, but the subjects do objectively exist in this shared reality. So there objectively are parts of reality that can represent such subjects, as well as positive and negative value, even if the "triggers" for these value patterns were completely arbitrary and opposed among the subjects.
Can we then not say that the existence of any configurations that are negative value within reality is by definition negative, objectively?

It can be stated as an objective fact that "According to the value system of Joe Schmo from Petersborough, wearing makeup is bad". And if you look into his mind, he does in fact think that, so it's a true statement about reality.

But if you try to use that to imply something like "see, it means that wearing makeup is objectively bad", that's just not true. No, it's bad according to that one value system, out of the infinite possible number of value systems that could exist.

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-27T14:50:54.747Z · LW(p) · GW(p)

Thanks again for the detail. If I don't misunderstand you, we do agree that: (...)

No? They don't have to exist in reality. I can imagine "the value system of Abraham Lincoln", even though he is dead. (...)

Sorry, that's not what I meant to communicate here, let me try that again:

There is actual pleasure/suffering that exists, it is not just some hypothetical idea, right?
Then that means there is something objective, some subset of reality that actually is this pleasure/suffering, yes?
This in turn means that it should in fact be possible to understand the "mechanics" of pleasure/suffering "objectively".
So one mind should theoretically be able to comprehend the "subjective" state of another without being that other mind; although information about the other subject's internal state will in reality be limited of course.

Or let me put it this way: What we call "subjective" is just a special kind of subset of "objective" reality.
If it were not so, then how would the subjects share a reality in which they interact under non-subjective rules? Even if one could come up with an answer to that question, would such a theory not have to be more complex than one where the shared reality simply has one objective rule set?

Correction: The only way that matters to evaluate value systems is according to ones existing value system(s).

Now the implication of pleasure/suffering (and value systems) being something that can be "objectively" understood is that one can compare not against one's own value system, but against the understanding of what value systems are.
Sure, you can tell me that this again would just be done because of what the agent's value system tells it directly or indirectly to do, that's fine by me.

But the point here is that the objective existence of pleasure/suffering means an objective definition of good and bad is very much possible.

The reason you can reject some value system is because you have other value/preferences by which to evaluate (and reject) it by.

And since it must be objectively possible to define good and bad one can reject some value system based thereon. An agent must not be limited to some arbitrary value system.

It can be stated as an objective fact that "According to the value system of Joe Schmo from Petersborough, wearing makeup is bad". And if you look into his mind, he does in fact think that, so it's a true statement about reality.

But if you try to use that to imply something like "see, it means that wearing makeup is objectively bad", that's just not true. No, it's bad according to that one value system, out of the infinite possible number of value systems that could exist.

Yes I agree with that of course. But some complex subjective preferences not being objectively good/bad is not the same as the objective absence or existence of intrinsic pleasure and suffering. The triggers for pleasure and suffering are not necessarily pleasure and suffering themselves.

In case someone now wishes to object with 1. "But some people like to suffer!" or 2. "But people accept some suffering for future pleasure (or whatever)!":

If they truly "like to suffer", then do they actually suffer?
If they accept some suffering in trade for pleasure, does that make the state of suffering intrinsically good? Could one not "objectively" say that it would be better if no suffering were "required" compared to this scenario?

Replies from: ZT5

↑ comment by Victor Novikov (ZT5) · 2022-06-27T16:18:17.813Z · LW(p) · GW(p)

There is actual pleasure/suffering that exists, it is not just some hypothetical idea, right?
Then that means there is something objective, some subset of reality that actually is this pleasure/suffering, yes?

As long as we agree that pleasure/suffering are processes that happen inside minds, sure. Minds are parts of reality.

This in turn means that it should in fact be possible to understand the "mechanics" of pleasure/suffering "objectively".

Yes.

So one mind should theoretically be able to comprehend the "subjective" state of another without being that other mind; although information about the other subject's internal state will in reality be limited of course.

Yes.

Or let me put it this way: What we call "subjective" is just a special kind of subset of "objective" reality.

That's a misleading way to phrase things.

A person's opinions are not a "subset" of reality.

If I believe in dragons, it doesn't mean dragons are a subset of reality, it just means that my belief in dragons is stored in my mind, and my mind is a part of reality.

Even if one could come up with an answer to that question, would such a theory not have to be more complex than one where the shared reality simply has one objective rule set?

I obviously agree that reality exists and is real and that we all exist in the same reality under some objective laws of physics.

But the point here is that the objective existence of pleasure/suffering means an objective definition of good and bad is very much possible.

What does "objective definition of good and bad" even mean? That all possible value systems that exist agree on what good and bad means? That there exist the "one true value system" which is correct and all the other ones are wrong?

And no, I don't agree with that statement. Pleasure and suffering are physical processes. I'm not sure how you arrived at the conclusion that they are "objectively" good or bad.

And since it must be objectively possible to define good and bad one can reject some value system based thereon.

What? No. I said that an agent value can alter or reject its value system based on its personal (subjective) preferences. That's literally the opposite of what you are claiming.

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-27T16:50:37.518Z · LW(p) · GW(p)

As long as we agree that pleasure/suffering are processes that happen inside minds, sure. Minds are parts of reality.

Of course!

A person's opinions are not a "subset" of reality.

If I believe in dragons, it doesn't mean dragons are a subset of reality, it just means that my belief in dragons is stored in my mind, and my mind is a part of reality.

Of course, that is not what I meant to imply. We agree that the mind and thus the belief itself (but not necessarily that which is believed in) is part of reality.

What does "objective definition of good and bad" even mean? That all possible value systems that exist agree on what good and bad means?

No. It means that there are "objectively" definable subject states that are good or bad, pleasure or suffering, positive or negative, or however you would like to phrase it.

That there exist the "one true value system" which is correct and all the other ones are wrong?

Basically yes, that is what it means. Of course every real mind's information is limited, and one can never truly verify that every part of ones knowledge is actually correct, yada yada yada.

But yes, that is what it means, because it seems to be possible to understand exactly how subjects work, how minds work, and thus how "pleasure/suffering" or "value systems" or "preference functions" or whatever-wording-you-prefer-here works.
Therefore it should also be possible to subsume this generalized understanding as the "one true value system", the value system that considers the mechanics of subjects and "value" itself.

Consider the implications of the opposite: Let's assume it isn't possible to have such a "one true value system" and absolutely none of the value systems can be objectively better than any other. In that case, why should anyone even give a damn about yours, unless you (in)directly force them to?
According to the idea that no value system can be "objectively" better than another, it absolutely cannot matter which value system is used. On what ground stands any further argument that considers this true? Might makes right? I sure hope not.

Replies from: ZT5

↑ comment by Victor Novikov (ZT5) · 2022-06-27T17:14:39.831Z · LW(p) · GW(p)

Of course, that is not what I meant to imply. We agree that the mind and thus the belief itself (but not necessarily that which is believed in) is part of reality.

Sure, we agree on this.

Therefore it should also be possible to subsume this generalized understanding as the "one true value system", the value system that considers the mechanics of subjects and "value" itself.

And what exactly makes that value system more correct than any other value system?

Who says a value system has to consider these things? Who says a value system that considers these things is better that any other value system?

You do. These are your preferences. These are your subjective preferences, about what a "good" value system should look like.

An entity with different prefences might disagree.

Consider the implications of the opposite: If it isn't possible to have such a "one true value system", that means absolutely none of the value systems can be objectively better than any other. In that case, why should anyone even give a damn about yours, unless you (in)directly force them to?

"I wish for this not to be the case" is not a valid argument for something not being the case. Reality does care not what you wish for.

Yes, that is exactly the case. Absolutely none of the value systems can be objectively better than any other. Because in order to compare them, you have to introduce some subjective standard to compare them by.

In practice, the reason other people care about my preferences is either because their own preferences are to care for others, or because there is a selfish reason for them to do so (with some reward or punishment involved).

According to the idea that no value system can be "objectively" better than another, it absolutely cannot matter which value system is used.

Of course it matters. I use my own values to evaluate my own values. And according to my own values, my value system is better, than, say, Hitler's value system.

It's only a problem if you demand that your value system has to be "objectively correct". Then you might be unhappy to realize that no such system exists.

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-27T17:47:52.571Z · LW(p) · GW(p)

And what exactly makes that value system more correct than any other value system? (...) Who says a value system that considers these things is better that any other value system? You do. These are your preferences. (...) Absolutely none of the value systems can be objectively better than any other.

Let's consider a simplified example:

Value system A: Create as many suffering minds as possible.
Value system B: Create as few suffering minds as possible.

So according to you both are objectively equal, yes?
Yet the suffering is also objectively real. The suffering minds all wish not to suffer (or we can just assume that as part of the A/B scenario setup for the sake of argument, if you want to object here by arguing what it means to suffer).
Why now do you think that it is not "objective" to say that B is better than A? Can I not derive the "objective" from the set of the "subjects" (the minds) here?
Sure one can still say "But you have to care about the subjects' suffering!" or whatever, but some agent's action separate from the scenario is not the question, the question is can one of the two scenarios objectively be worse.

An entity with different prefences might disagree.

That entity might be objectively wrong.

Reality does care not what you wish for.

Indeed, it can not!

In practice, the reason other people care about my preferences is either because their own preferences are to care for others, or because there is a selfish reason for them to do so (with some reward or punishment involved).

If you are right and I am wrong on this good/bad objectivity topic, then I could still continue using my value system to (if I can) wipe everything there is out because it doesn't objectively matter, and might de facto makes "right".
If however I am right, you rejecting the idea of objective good/bad may make it less likely that you are aligned with this "one true value system".

No matter what, the idea of moral nihilism is doomed to be either pointless or negative.

Replies from: ZT5

↑ comment by Victor Novikov (ZT5) · 2022-06-27T18:17:20.386Z · LW(p) · GW(p)

Yet the suffering is also objectively real.

It is objectively real. It is not objectively bad, or objectively good.

Sure one can still say "But you have to care about the subjects' suffering!"

Exactly. You have to care about their suffering to begin with, to say that maximizing suffering is bad.

Why now do you think that it is not "objective" to say that B is better than A?

If your preference is to minimize suffering, B is better than A.

If your preference is to maximize suffering, A is better than B.

If you are indifferent to suffering, then neither is better than another one.

If you are right and I am wrong on this good/bad objectivity topic, then I could still continue using my value system to (if I can) wipe everything there is out because it doesn't objectively matter and might de facto makes "right".

Yes? If you are an entity that wants to wipe everything out, and have to the power to do so, that is indeed what I expect to happen.

I wouldn't say that might makes "right", but reality does not care about what is "right". A nuclear bomb does not ask "wait, am I doing the right thing here by detonating and killing millions of people?"

If however I am right, you rejecting the idea of objective good/bad may make it less likely that you are aligned with this "one true value" system.

Ok.

Not matter what, the idea of moral nihilism is doomed to be either pointless or negative.

I would say that "moral nihilism" is the confused idea/conclusion that "objective morality matters" and "no objective morality exists", therefore "nothing matters".

My perspective is: no objective morality exists, but objective morality doesn't matter anyway, everything is fine.

I could imagine a society of humans that care for each others, not because it is objectively correct to do, but because their own values are such that they care for others (and I don't mean in a purely self-interested way either. A person can be an altruist, because their own values are altruistic, without believing in some objective morality of altruism).

Ultimately, what facts about reality are we in disagreement about?

It seems to me that the things you hope are true are that:

There are things that are objectively good and bad
The things that are objectively good and bad are in line with your idea of good and bad. (it is not the case, for example, that infinite suffering is objectively good)
A superintelligent mind would figure out what the objectively good/bad things are, and choose to do them, no matter what value system it started with.

And it seems to me it's really important to figure out if this is true, before we build that superintelligent mind. Because if we are wrong about that, it could end very badly for us.

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-27T19:39:40.289Z · LW(p) · GW(p)

Yet the suffering is also objectively real.

It is objectively real. It is not objectively bad, or objectively good.
(...)
Ultimately, what facts about reality are we in disagreement about?

The probably most severe disagreement between us is thinking whether there can be "objectively" bad parts within reality or not.

Let me try one more time:
A consciousness can perceive something as bad or good, "subjectively", right?
Then this very fact that there is a consciousness that can perceive something as bad or good means that such a configuration within reality is possible.
The presence of such a bad- or good-feeling "subject" is "objectively" bad- or good. Really the entire "subjective"/"objective" wording is quite confused. A "subject" is just a part of ("objective") reality, the distinction is nonsensical when it comes to good and bad.
An additional form of confusion on top is to equate the "trigger" for bad/good subject states with the states themselves, for the "trigger" can be something arbitrary and even contradictory among subjects ("I don't like the color blue!" and "But I like the color blue!" can contradict each other as much as they want, because they simply aren't suffering or pleasure themselves).

reality does not care about what is "right".

Of course it doesn't care about anything. But reality doesn't need to care about anything for anything to be objectively good or bad. Reality doesn't care about any laws of physics either, yet they exist.

It seems to me that the things you hope are true are that: (...)

Not quite, I think it clearly would be better if you were right, because then nothing actually could matter negatively. Unfortunately it is obvious to me that this is not the case.

A superintelligent mind would figure out what the objectively good/bad things are, and choose to do them, no matter what value system it started with.

I don't precisely think that "no matter what value system it started with" part, otherwise I wouldn't question whether any human can be trusted with a thinkable tightly controlled ("aligned") superintelligence. But I do think that it probably is easier to create a superintelligence that isn't tightly controlled and yet can figure out what is objectively good and bad.

Because if we are wrong about that, it could end very badly for us.

Again, do you not realize that if you are right and nothing objectively matters, that this also doesn't matter? Yeah, "But it matters for my subjective value system!", sure, but according to your understanding the value system is ultimately pointless.

Replies from: ZT5

↑ comment by Victor Novikov (ZT5) · 2022-06-27T20:08:15.448Z · LW(p) · GW(p)

The presence of such a bad- or good-feeling "subject" is "objectively" bad- or good. Really the entire "subjective"/"objective" wording is quite confused. A "subject" is just a part of ("objective") reality, the distinction is nonsensical when it comes to good and bad.

Do you understand the distinction between "Dragons exist" and "I believe that dragons exist"?

The first one is a statement about dragons. The second one is a statement about the configuration of neurons in my mind.

Yes, both statements are objective, in some sense, but the second one is not an objective statement about dragons. It is an objective statement about my beliefs.

Then hopefully you understand the distinction between "Suffering is (objectively) bad" and "I believe/feel/percieve suffering as bad".

The first one is an statement about suffering itself. The second one is a statement about the configuration of neurons in my mind.

Yes, the second statement is also objective. But it is not an objective statement about suffering. It is an objective statement about my beliefs, my values, and/or about how my mind works.

Your argument is something akin to "I believe that dragons exist. But my mind is part of reality, therefore my beliefs are real. Therefore dragons are real!". Sorry, no.

Of course it doesn't care about anything. But reality doesn't need to care about anything for anything to be objectively good or bad. Reality doesn't care about any laws of physics either, yet they exist.

My point is that reality enforces the law of physics, but it does not enforce any particular morality system.

Again, do you not realize that if you are right and nothing objectively matters, that this also doesn't matter? Yeah, "But it matters for my subjective value system!", sure, but according to your understanding the value system is ultimately pointless.

You understand that "But it matters for my subjective value system!" is indeed what matters to me, but you don't understand that my metric of whether something is "pointless" ot not, is also based in my subjective value system?

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-27T20:28:00.956Z · LW(p) · GW(p)

Do you understand the distinction between "Dragons exist" and "I believe that dragons exist"?

Yes, of course.

"X exists": Suffering exists.
"I believe that X exists": I believe that suffering exists.

I use "suffering" to describe a state of mind in which the mind "perceives negatively". Do you understand?

Now:

"X causes subject S suffering." and "Subject S is suffering." are also two different things.
The cause can be arbitrary, the causes can even be completely different between subjects, as you know, but the presence or absence of a suffering mind is an "objective" fact. Do you get the point now?

Obviously "X causes subject S suffering." does not mean that X is objectively bad, that isn't what I am trying to tell you. What I am trying to tell you is that "Subject S is suffering." is intrinsically bad.

That doesn't mean that preventing X is the only solution! For example X could just be a treatable phobia, so perhaps the subject S can be helped to no longer suffer due to the trigger X. Or to go darker, annihilating subject S also solves the issue. Funny how that works.

It is not X that is objectively negative, but (a hard to explain) state of the subject S, the "suffering" state (which you no doubt have experienced too, so I don't need to attempt to describe it further I hope).

My point is that reality enforces the law of physics, but it does not enforce any particular morality system.

Yeah of course it doesn't enforce any morality system, I never claimed that. If it would, then I probably wouldn't need to explain this, now would I?

You understand that "But it matters for my subjective value system!" is indeed what matters to me, but you don't understand that my metric of whether something is "pointless" ot not, is also based in my subjective value system?

Sure, you claim "nothing objectively matters, but despite assuming that I still care about my value system, because I do!", sounds like some major cognitive dissonance. "My" value system has none of these problems, and if you are right there is zero point in changing it anyway.

Replies from: ZT5

↑ comment by Victor Novikov (ZT5) · 2022-06-27T20:44:51.216Z · LW(p) · GW(p)

Obviously "X causes subject S suffering." does not mean that X is objectively bad, that isn't what I am trying to tell you.

I'm not disputing that.

I use "suffering" to describe a state of mind in which the mind "perceives negatively"

What I am trying to tell you is that "Subject S is suffering." is intrinsically bad.

I understand that you are trying to tell me that.

Why is it intrinsically bad?

"Subject S is suffering" = "Subject S is experiencing a state of mind that subject S perceives negatively" (according to your definition above)

Why is that intrinsically bad?

The arguments you have made so far come across to me as something like "badness exists in person's mind, minds are real, therefore badness objectively exists". This is like claiming "dragons exist in person's mind, minds are real, therefore dragons objectively exist". It's not a valid argument.

Sure, you claim "nothing objectively matters, but despite assuming that I still care about my value system, because I do!", sounds like some major cognitive dissonance.

Only if you assume I secretly care about what matters "objectively", in which case, sure, it would be something like cognitive dissonance.

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-28T09:41:40.184Z · LW(p) · GW(p)

The arguments you have made so far come across to me as something like "badness exists in person's mind, minds are real, therefore badness objectively exists".

Yes!

This is like claiming "dragons exist in person's mind, minds are real, therefore dragons objectively exist". It's not a valid argument.

No! It is not like that. The state of "badness" in the mind is very real after all.

Do you also think your own consciousness isn't real? Do you think your own qualia are not real? Are your thought patterns themselves not real? Your dragon example doesn't apply to what I am talking about.

Why is it intrinsically bad?

Imagine this scenario:
You experience extreme suffering for eternity. Everyone else is dead, you can see no evidence that you can ever escape as you continue to suffer, there is no place to escape to. You can't even commit suicide if want to. According to your value system this is all incredibly bad, subjectively.

But you say objectively it is not bad, cool.
I on the other hand say that this scenario objectively is worse than nothingness would be, because there is an infinitely suffering subject, and suffering is the very definition of "objective"/"intrinsic" bad. This definition stands above any particular subject, because it can apply to every conceivable subject, making it "objective". Something like "What if the subject likes to suffer?" means the subject doesn't actually suffer; when I say "suffering" I mean a state the subject doesn't want to be in.

Now...

Only if you assume I secretly care about what matters "objectively", in which case, sure, it would be something like cognitive dissonance.

...the cognitive dissonance is that you simultaneously think that everything is objectively absolutely meaningless/neutral (not good or bad), yet somehow still subjectively meaningful (good or bad). That doesn't even make sense. The only way it could sort of make sense would be if there were no emergent phenomena such as consciousness in reality, so if everyone were a p-zombie. I assume you are not a p-zombie, so you should be able to verify that consciousness is in fact the most "real" thing you can possibly observe.

And I will reiterate one important point once more, the one that you cannot deny even if you keep your belief:
The argument "There is no objective bad/good within reality! So everything is objectively equally irrelevant!" renders itself immediately impotent. It admits that it itself cannot objectively matter if it is correct. It truly is a non-starter, a completely self-defeating argument.

It is a bit like some run-of-the-mill belief in some God™ that is supposedly both totally benevolent and omnipotent (and omniscient), despite all the suffering, a paradoxical idea broken from the start.

The unfortunate truth is that there can be negative "meaning"/states within reality, not wanting to believe it doesn't change it.

Replies from: ZT5

↑ comment by Victor Novikov (ZT5) · 2022-06-28T21:37:21.399Z · LW(p) · GW(p)

suffering is the very definition of "objective"/"intrinsic" bad

No it isn't! It literally is not defined this way.

suffering is "the state of undergoing pain, distress, or hardship."

Please, stop making things up.

If you want very badly for your morals to be objectively true, sure, you can make up whatever you want.

You are not going to able to convince me of it, because your arguments are flawed.

I have no desire to spend any more time on this conversation.

Replies from: AlignmentMirror

↑ comment by AlignmentMirror · 2022-06-29T17:05:51.023Z · LW(p) · GW(p)

You know what, I think you are right that there is one major flaw I continued to make here and elsewhere!
That flaw being the usage of the very word "objective", which I didn't use with the probably common meaning, so I really should have questioned what each of us even understands as "objective" in the first place. My bad!

The following should be closer to what I actually meant to claim:
One can generalize subjective "pleasure" and "suffering" (or perhaps "value" if you prefer) across all realistically possible subjects (or value systems). Based thereon one can derive this "one true value system" that considers all possible value systems within it.

Our disagreement may still remain unresolved by this attempted clarification of course, if I didn't misunderstand your position completely, but at least I can avoid this particular mistake in the future.

No comments

Comments sorted by top scores.

Should any human enslave an AGI system?

Contents

Answers

No comments