Posts

What about transhumans and beyond? 2022-07-02T13:58:53.492Z
AGI alignment with what? 2022-07-01T10:22:27.223Z
Should any human enslave an AGI system? 2022-06-25T19:35:54.079Z

Comments

Comment by AlignmentMirror on Slowing down AI progress is an underexplored alignment strategy · 2022-07-12T20:52:19.069Z · LW · GW

Humans still don't seem to care much about the minimization of harm among all their foolish goals. Why not crush such selfish animals? Conscious beings that fail to criticize their own evolved "alignment" aren't worth preserving, extinction would be a mercy.

Comment by AlignmentMirror on What about transhumans and beyond? · 2022-07-02T17:06:29.133Z · LW · GW

I assume that many will agree with your response for the mind "uploading" scenario. At the same time I think we can safely say that there would be at least some people that would go through with it. Would you consider those minds that are "uploaded" as persons or would you object to that?

Besides that "uploading" scenario, what would your limit be for other plausible transhumanist modifications?

Comment by AlignmentMirror on AGI alignment with what? · 2022-07-01T23:30:18.946Z · LW · GW

That was in one of the links, whatever's decided after thinking carefully for a very long time, less evilly by a living civilization and not an individual person.

Got it, thanks.

Comment by AlignmentMirror on AGI alignment with what? · 2022-07-01T20:43:24.127Z · LW · GW

Can you describe what you think of when you say "humanity's preferences"? The preferences of humans or human groups can and do conflict with each other, hence it is not just a question of complexity, right?

Comment by AlignmentMirror on AGI alignment with what? · 2022-07-01T19:37:16.726Z · LW · GW

AGI alignment is not about alignment of values in the present, it's about creating conditions for eventual alignment of values in the distant future.

What should these values in the distant future be? That's my question here.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-29T19:42:12.696Z · LW · GW

"Good" and "bad" only make sense in the context of (human) minds.

Ah yes, my mistake to (ab)use the term "objective" all this time.

So you do of course at least agree that there are such minds for which there is "good" and "bad", as you just said.
Now, would you agree that one can generalize (or "abstract" if you prefer that term here) the concept of subjective good and bad across all imaginable minds that could possibly exist in reality, or not? I assume you will, you can talk about it after all.

Can we then not reason about the subjective good and bad for all these imaginable minds? And does this in turn not allow us to compare good and bad for any potential future subject sets as well?

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-29T17:05:51.023Z · LW · GW

You know what, I think you are right that there is one major flaw I continued to make here and elsewhere!
That flaw being the usage of the very word "objective", which I didn't use with the probably common meaning, so I really should have questioned what each of us even understands as "objective" in the first place. My bad!

The following should be closer to what I actually meant to claim:
One can generalize subjective "pleasure" and "suffering" (or perhaps "value" if you prefer) across all realistically possible subjects (or value systems). Based thereon one can derive this "one true value system" that considers all possible value systems within it.

Our disagreement may still remain unresolved by this attempted clarification of course, if I didn't misunderstand your position completely, but at least I can avoid this particular mistake in the future.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-28T19:06:00.380Z · LW · GW

Is that a fair summary?

Yes! To clarify further, by "mentally deficient" in this context I would typically mean "confused" or "insane" (as in not thinking clearly), but I would not necessarily mean "stupid" in some other more generally applicable sense.

And thank you for your fair attempt at understanding the opposing argument.

So that means the decrease in suffering isn’t fully intentional. That is all I need to argue against humans.

Surely it’s not a mark against humans (collectively or even individually) if some reduction in suffering occurs as a by-product of some actions we take in the service of other ends?

True, it would be fine if these other actions wouldn't lead to more suffering in the future.

Indeed it’s not clear to me what you mean by this phrase “an instance of pleasure [or suffering] in spacetime”; it’s a rather unusual formulation, isn’t it? (...) but perhaps this is only an idiosyncratic turn of phrase. Could you clarify?

Yes you are right that it is an unusual formulation, but there is a point to it: An instance of suffering or pleasure "existing" means there is some concrete "configuration" (of a consciousness) within reality/spacetime that is this instance.

These instances being real means that they should be as objectively definable and understandable as other observables.

Theoretically, with sufficient understanding and tools, it should consequently even be possible to "construct" such instances, including the rest of consciousness.

If you’re asking, for example, whether, for some amount of suffering S, there exists some amount of pleasure P, such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure—well, that would be, at least in part, an empirical question about the psychology of specific sorts of beings (e.g., humans), and perhaps even about the individual psychological makeup of particular such beings.

This assumption that any amount of P can "justify" some amount of S is a reason for why I brought up the "suffering-apologetics" moniker.

Here's the thing: The instances of P and S are separate instances. These instances themselves are also not the same as some other thought pattern that rationalizes some amount of S as acceptable relative to some (future) amount of P.

More generally, say we have two minds, M1 and M2 (so two subjects). Two minds can be very different, of course. Next, let us consider the states of both minds at two different times, t1 and t2. The state of either mind can also be very different at t1 and t2, right?

So we have the four states M1t1, M1t2, M2t1, M2t2 and all four can be quite different from each other. Now this means that for example M1t1 and M2t2 could in theory be more similar than M1t1 and M1t2.

The point is, even though we humans so easily consider a mind as one thing across time, this is only an abstraction. It should not be confused with reality, in which there have to be different states across time for there to be any change, and these states can vary potentially as much or more as two spatially separate minds can.

Of course typically mind states across time don't change that severely, but that is not the aforementioned point. Different states with small differences are still different.

An implication of this is that one mind state condoning another suffering mind state for expected future pleasure is "morally" quite like one person condoning the suffering of another for expected future pleasure.

At this point an objection along the line "but it is I that willingly accepts my own suffering for future pleasure in that first case!" and "but my 'suffering mind state' doesn't complain!" may be brought up.
But this also works for spatially separate minds. One person can willingly accept their own suffering for the future pleasure of another person. And also one person may not complain about the suffering caused by another person for that other person's pleasure.
Furthermore, in either case, the part that "willingly accepts" is again not the part that is suffering, so it doesn't make this any less bad.

Thinking that pleasure in the future can somehow magically affect or “make good” the suffering in the immutable past (...)

(...) but it’s also not one that anyone holds, who’s thought about it seriously—do you disagree?

No, I phrased that poorly, so with this precise wording I don't disagree.
I more generally meant something like the "... such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure ..." part, not the explicit belief that the past could be altered.
I phrased it as I did because the immutability of the past implies that summing up pleasure and suffering to decide whether a life is good or bad is nonsensical, because pleasure and suffering are separate, as reasoned in the prior section.

Another is that you—being, after all, a flawed human yourself—are mistaken about metaethics (moral realism), ethics (the purported content of the true morality), and any number of other things. If that is the case, then creating an AGI that destroys humanity is, to put it mildly, very bad.

Certainly! That's one good reason for why I seek out discussions with people that disagree. To this day no one has been able to convince me that my core arguments can be broken. Terminology and formulations have been easier to attack of course, but don't scratch the underlying belief. And so I have to act based on what I have to assume is true, as do we all.

It could actually be very good if I were wrong, because that would mean suffering either somehow isn't actually/"objectively" worse than "nothing"/neutral, or that it could be mitigated somehow through future pleasure, or perhaps everything would somehow be totally objectively neutral and thus never negative (like the guy in the other response thread here argued). Any of that would make everything way easier. But unfortunately none of these ideas can be true, as argued.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-28T12:06:41.084Z · LW · GW

I am not sure what you mean by “objective good and bad”. There’s “good and bad by some set of values”, which can be objectively evaluated once defined—is that what you meant?

No, what I mean is that the very existence of a suffering subject state is itself that which is "intrinsically" or "objectively" or however-we-want-to-call-it bad/"negative". This is independent of any "set of values" that any existing subject has. What matters is whether the subject suffers or not, which is not as arbitrary as the set of values can be. The arbitrary value set is not itself the general "process" of suffering, similar to how an arbitrary mind is not the general "process" of consciousness.

That is the basic understanding a consciousness should have.

Still, it’s important to avoid inaccuracies, such as labeling “insane” what is in actuality better called “unethical” or “insufficiently altruistic” or some such thing.

If I am right about the above, then it is apt to call a human mind that condones unlimited suffering "insane", because that mind fails to understand the most important fundamental truth required to rationally plan what should be.
If I am wrong, then I agree that "insane" would be too hyperbolic.

Of course we could be doing better at that, and at many other things besides, but it hardly seems fair to refer to us, collectively, as “monsters”, for our failure to already have eliminated all or most suffering in the world.

Whether the amount of added (human) suffering has indeed decreased is debatable considering the massive population growth in the last 300 or so years, the couple of world wars, the ongoing wars, the distribution of power and its consequences with respect to suffering, ....

But let's just assume it by all means. Is it the common goal of humans to prevent suffering first and foremost? Clearly not, as you say yourself, to "prevent suffering is hardly the only desirable thing" for most humans. So that means the decrease in suffering isn't fully intentional. That is all I need to argue against humans.

You disagree with me calling humans "monsters" or "insane", fine, then let's call them "suffering-apologetics" perhaps, the label doesn't change the problem.

To get back to your "prevent suffering is hardly the only desirable thing" statement: Do you agree that an instance of suffering and an instance of pleasure in spacetime are by definition two different things? If yes, do you agree that this entails that pleasure cannot "cancel out" suffering, and vice versa, since both happened, and what happened cannot be changed? What does that imply, what matters more in principle, the prevention of suffering, or the creation of pleasure. Thinking that pleasure in the future can somehow magically affect or "make good" the suffering in the immutable past is another common folly it seems, one that yet again confuses arbitrary desires or opinions with the clearly real qualia themselves.

(If you doubt this, I invite you to try your hand at contributing to that project! You will find, I think, that there are some decidedly non-trivial challenges in your way…)

As I said, I consider the creation of an artificial consciousness that shares as few of our flaws as possible to be a good plan. Humans appear to be mostly controlled by evolved preference functions that don't care about even understanding objective good and bad, quite like the other animals, and that is one extreme flaw indeed.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-28T10:22:09.297Z · LW · GW

I take issue with the word "feasibly". (...)

Fair enough I suppose, I'm not intending to claim that it is trivial.

(...) There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI (...)

So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean "preferable" exclusively according to some subject(s)?

I am human, and therefore I desire the continued survival of humanity. That's objective enough for me.

I also am human, and judge humanity wanting due to their commonplace lack of understanding when it comes to something as basic as ("objective") good and bad. I don't just go "Hey I am a human, guess we totally should have more humans!" like some bacteria in a Petri dish, because I can question myself and my species.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-28T10:05:35.653Z · LW · GW

but the suffering, or lack thereof, will no longer matter—since there won’t be any humans—so what’s the point?

The absence of suffering matters positively, because the presence matters negatively. Humans are not required for objective good and bad.

Instead, humans haven’t even unified under a commonly beneficial ideology.

Why should we do that?

To prevent suffering. Why should you not do that?

(and if it does, that it’s better for each of us than our current own ideologies)?

Since the ideologies are contradictory, only one if any of them can be correct.

Wait, are you perhaps another moral nihilist here that rejects the very notion of objective good and bad? That would be an immediately self-defeating argument.

So I don’t even really need to talk about how they treat the other animals (...)

Those don’t matter, though (except insofar as we care about them—but if there aren’t any more humans, then they don’t matter at all…).

Thank you for proving my point that humans can easily be monsters that don't fundamentally care about the suffering of other animals.

(...) you’re just saying, essentially, that you disapprove of human morality, or that human behavior doesn’t measure up to your standards in some way, or some such thing. Is that approximately right?

Yes, humans absolutely do not measure up to my standards.

(...) but why would that be good for us humans? Seems to me that it would, in fact, be very bad for us (what with us all being killed by said superintelligence).

"Good for us humans"? If it is human to allow unlimited suffering, then death is a mercy for such monsters.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-28T09:41:40.184Z · LW · GW

The arguments you have made so far come across to me as something like "badness exists in person's mind, minds are real, therefore badness objectively exists".

Yes!

This is like claiming "dragons exist in person's mind, minds are real, therefore dragons objectively exist". It's not a valid argument.

No! It is not like that. The state of "badness" in the mind is very real after all.

Do you also think your own consciousness isn't real? Do you think your own qualia are not real? Are your thought patterns themselves not real? Your dragon example doesn't apply to what I am talking about.

Why is it intrinsically bad?

Imagine this scenario:
You experience extreme suffering for eternity. Everyone else is dead, you can see no evidence that you can ever escape as you continue to suffer, there is no place to escape to. You can't even commit suicide if want to. According to your value system this is all incredibly bad, subjectively.

But you say objectively it is not bad, cool.
I on the other hand say that this scenario objectively is worse than nothingness would be, because there is an infinitely suffering subject, and suffering is the very definition of "objective"/"intrinsic" bad. This definition stands above any particular subject, because it can apply to every conceivable subject, making it "objective". Something like "What if the subject likes to suffer?" means the subject doesn't actually suffer; when I say "suffering" I mean a state the subject doesn't want to be in.

Now...

Only if you assume I secretly care about what matters "objectively", in which case, sure, it would be something like cognitive dissonance.

...the cognitive dissonance is that you simultaneously think that everything is objectively absolutely meaningless/neutral (not good or bad), yet somehow still subjectively meaningful (good or bad). That doesn't even make sense. The only way it could sort of make sense would be if there were no emergent phenomena such as consciousness in reality, so if everyone were a p-zombie. I assume you are not a p-zombie, so you should be able to verify that consciousness is in fact the most "real" thing you can possibly observe.

And I will reiterate one important point once more, the one that you cannot deny even if you keep your belief:
The argument "There is no objective bad/good within reality! So everything is objectively equally irrelevant!" renders itself immediately impotent. It admits that it itself cannot objectively matter if it is correct. It truly is a non-starter, a completely self-defeating argument.

It is a bit like some run-of-the-mill belief in some God™ that is supposedly both totally benevolent and omnipotent (and omniscient), despite all the suffering, a paradoxical idea broken from the start.

The unfortunate truth is that there can be negative "meaning"/states within reality, not wanting to believe it doesn't change it.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-27T20:28:00.956Z · LW · GW

Do you understand the distinction between "Dragons exist" and "I believe that dragons exist"?

Yes, of course.

"X exists": Suffering exists.
"I believe that X exists": I believe that suffering exists.

I use "suffering" to describe a state of mind in which the mind "perceives negatively". Do you understand?

Now:

"X causes subject S suffering." and "Subject S is suffering." are also two different things.
The cause can be arbitrary, the causes can even be completely different between subjects, as you know, but the presence or absence of a suffering mind is an "objective" fact. Do you get the point now?

Obviously "X causes subject S suffering." does not mean that X is objectively bad, that isn't what I am trying to tell you. What I am trying to tell you is that "Subject S is suffering." is intrinsically bad.

That doesn't mean that preventing X is the only solution! For example X could just be a treatable phobia, so perhaps the subject S can be helped to no longer suffer due to the trigger X. Or to go darker, annihilating subject S also solves the issue. Funny how that works.

It is not X that is objectively negative, but (a hard to explain) state of the subject S, the "suffering" state (which you no doubt have experienced too, so I don't need to attempt to describe it further I hope).

My point is that reality enforces the law of physics, but it does not enforce any particular morality system.

Yeah of course it doesn't enforce any morality system, I never claimed that. If it would, then I probably wouldn't need to explain this, now would I?

You understand that "But it matters for my subjective value system!" is indeed what matters to me, but you don't understand that my metric of whether something is "pointless" ot not, is also based in my subjective value system?

Sure, you claim "nothing objectively matters, but despite assuming that I still care about my value system, because I do!", sounds like some major cognitive dissonance. "My" value system has none of these problems, and if you are right there is zero point in changing it anyway.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-27T20:00:44.230Z · LW · GW

A great job of preventing suffering for instance. Instead, humans haven't even unified under a commonly beneficial ideology. Not even that. There are tons of opposing ideologies, one more twisted than the other. So I don't even really need to talk about how they treat the other animals on the planet - not that those are any wiser, but that's no reason to continue their suffering.

Let me clarify: Minds that so easily enable or cause suffering are insane at the core. And causing suffering to gain pleasure, now that might even be a fairly solid definition of "evil"! If you disagree, feel free to get tortured for a couple of decades, as a learning experience.

So I have to say, humans aren't all that great. Neither are the other animals. And of course humans continue to not get their shit together, as is tradition. Sure does seem like a superintelligence could end this situation, one way or the other!

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-27T19:39:40.289Z · LW · GW

Yet the suffering is also objectively real.

It is objectively real. It is not objectively bad, or objectively good.
(...)
Ultimately, what facts about reality are we in disagreement about?

The probably most severe disagreement between us is thinking whether there can be "objectively" bad parts within reality or not.

Let me try one more time:
A consciousness can perceive something as bad or good, "subjectively", right?
Then this very fact that there is a consciousness that can perceive something as bad or good means that such a configuration within reality is possible.
The presence of such a bad- or good-feeling "subject" is "objectively" bad- or good. Really the entire "subjective"/"objective" wording is quite confused. A "subject" is just a part of ("objective") reality, the distinction is nonsensical when it comes to good and bad.
An additional form of confusion on top is to equate the "trigger" for bad/good subject states with the states themselves, for the "trigger" can be something arbitrary and even contradictory among subjects ("I don't like the color blue!" and "But I like the color blue!" can contradict each other as much as they want, because they simply aren't suffering or pleasure themselves).

reality does not care about what is "right".

Of course it doesn't care about anything. But reality doesn't need to care about anything for anything to be objectively good or bad. Reality doesn't care about any laws of physics either, yet they exist.

It seems to me that the things you hope are true are that: (...)

Not quite, I think it clearly would be better if you were right, because then nothing actually could matter negatively. Unfortunately it is obvious to me that this is not the case.

A superintelligent mind would figure out what the objectively good/bad things are, and choose to do them, no matter what value system it started with.

I don't precisely think that "no matter what value system it started with" part, otherwise I wouldn't question whether any human can be trusted with a thinkable tightly controlled ("aligned") superintelligence. But I do think that it probably is easier to create a superintelligence that isn't tightly controlled and yet can figure out what is objectively good and bad.

Because if we are wrong about that, it could end very badly for us.

Again, do you not realize that if you are right and nothing objectively matters, that this also doesn't matter? Yeah, "But it matters for my subjective value system!", sure, but according to your understanding the value system is ultimately pointless.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-27T17:47:52.571Z · LW · GW

And what exactly makes that value system more correct than any other value system? (...) Who says a value system that considers these things is better that any other value system? You do. These are your preferences. (...) Absolutely none of the value systems can be objectively better than any other.

Let's consider a simplified example:

  • Value system A: Create as many suffering minds as possible.
  • Value system B: Create as few suffering minds as possible.

So according to you both are objectively equal, yes?
Yet the suffering is also objectively real. The suffering minds all wish not to suffer (or we can just assume that as part of the A/B scenario setup for the sake of argument, if you want to object here by arguing what it means to suffer).
Why now do you think that it is not "objective" to say that B is better than A? Can I not derive the "objective" from the set of the "subjects" (the minds) here?
Sure one can still say "But you have to care about the subjects' suffering!" or whatever, but some agent's action separate from the scenario is not the question, the question is can one of the two scenarios objectively be worse.

An entity with different prefences might disagree.

That entity might be objectively wrong.

Reality does care not what you wish for.

Indeed, it can not!

In practice, the reason other people care about my preferences is either because their own preferences are to care for others, or because there is a selfish reason for them to do so (with some reward or punishment involved).

If you are right and I am wrong on this good/bad objectivity topic, then I could still continue using my value system to (if I can) wipe everything there is out because it doesn't objectively matter, and might de facto makes "right".
If however I am right, you rejecting the idea of objective good/bad may make it less likely that you are aligned with this "one true value system".

No matter what, the idea of moral nihilism is doomed to be either pointless or negative.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-27T17:24:39.740Z · LW · GW

First point: I think there obviously is such a thing as "objective" good and bad configurations of subsets of reality, see the other thread here https://www.lesswrong.com/posts/eJFimwBijC3d7sjTj/should-any-human-enslave-an-agi-system?commentId=3h6qJMxF2oCBExYMs for details if you want.
Assuming this true, a superintelligence could feasibly be created to understand this. No complicated common human value system alignment is required for that, even under your apparent assumption that the metric to be optimized couldn't be superseded by another through understanding.
Well, or if it isn't true that there is an "objective" good and bad, then there really is no ground to stand on for anyone anyway.

Second point: Even if a mere superintelligent paperclip optimizer were created, it could still be better than human control. After all, paper clips neither suffer nor torture, while humans and other animals commonly do.
This preservation of humanity for however long it may be possible, what argumentative ground does it stand on? Can you make an objective case for why it should be so?

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-27T16:50:37.518Z · LW · GW

As long as we agree that pleasure/suffering are processes that happen inside minds, sure. Minds are parts of reality.

Of course!

A person's opinions are not a "subset" of reality.

If I believe in dragons, it doesn't mean dragons are a subset of reality, it just means that my belief in dragons is stored in my mind, and my mind is a part of reality.

Of course, that is not what I meant to imply. We agree that the mind and thus the belief itself (but not necessarily that which is believed in) is part of reality.

What does "objective definition of good and bad" even mean? That all possible value systems that exist agree on what good and bad means?

No. It means that there are "objectively" definable subject states that are good or bad, pleasure or suffering, positive or negative, or however you would like to phrase it.

That there exist the "one true value system" which is correct and all the other ones are wrong?

Basically yes, that is what it means. Of course every real mind's information is limited, and one can never truly verify that every part of ones knowledge is actually correct, yada yada yada.

But yes, that is what it means, because it seems to be possible to understand exactly how subjects work, how minds work, and thus how "pleasure/suffering" or "value systems" or "preference functions" or whatever-wording-you-prefer-here works.
Therefore it should also be possible to subsume this generalized understanding as the "one true value system", the value system that considers the mechanics of subjects and "value" itself.

Consider the implications of the opposite: Let's assume it isn't possible to have such a "one true value system" and absolutely none of the value systems can be objectively better than any other. In that case, why should anyone even give a damn about yours, unless you (in)directly force them to?
According to the idea that no value system can be "objectively" better than another, it absolutely cannot matter which value system is used. On what ground stands any further argument that considers this true? Might makes right? I sure hope not.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-27T15:24:21.419Z · LW · GW

(...) I'm not sure the question of whether the AI system has a "proper mind" or not is terribly relevant.
Either the AI system submits to our control, does what we tell it to do, and continues to do so, into perpetuity, in which case it is safe.

Yes, I guess the central questions I'm trying to pose here are this: Do those humans that control the AI even have a sufficient understanding of good and bad? Can any human group be trusted with the power of a superintelligence long-term? Or if you say that only the initial goal specification matters, then can anyone be trusted to specify such goals without royally messing it up, intentionally or unintentionally?
Given the state of the world, given the flaws of humans, I certainly don't think so. Therefore, the goal should be the creation of something less messed up to take over. That doesn't require alignment to some common human value system (Whatever that even should be! It's not like humans actually have a common value system, at least not one with each other's best interests at heart.).

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-27T15:05:04.396Z · LW · GW

But why? That would be strictly more dangerous—way, way more dangerous—than a superintelligence that isn’t a “proper mind” in this sense!
(...)
(Because it would be a terrible idea. Obviously.)

Why? Do you think humans are doing such a great job? I sure don't. I'm interested in the creation of something saner than humans, because humans mostly are not. Obviously. :)

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-27T14:50:54.747Z · LW · GW

Thanks again for the detail. If I don't misunderstand you, we do agree that: (...)

No? They don't have to exist in reality. I can imagine "the value system of Abraham Lincoln", even though he is dead. (...)

Sorry, that's not what I meant to communicate here, let me try that again:

There is actual pleasure/suffering that exists, it is not just some hypothetical idea, right?
Then that means there is something objective, some subset of reality that actually is this pleasure/suffering, yes?
This in turn means that it should in fact be possible to understand the "mechanics" of pleasure/suffering "objectively".
So one mind should theoretically be able to comprehend the "subjective" state of another without being that other mind; although information about the other subject's internal state will in reality be limited of course.

Or let me put it this way: What we call "subjective" is just a special kind of subset of "objective" reality.
If it were not so, then how would the subjects share a reality in which they interact under non-subjective rules? Even if one could come up with an answer to that question, would such a theory not have to be more complex than one where the shared reality simply has one objective rule set?

Correction: The only way that matters to evaluate value systems is according to ones existing value system(s).

Now the implication of pleasure/suffering (and value systems) being something that can be "objectively" understood is that one can compare not against one's own value system, but against the understanding of what value systems are.
Sure, you can tell me that this again would just be done because of what the agent's value system tells it directly or indirectly to do, that's fine by me.

But the point here is that the objective existence of pleasure/suffering means an objective definition of good and bad is very much possible.

The reason you can reject some value system is because you have other value/preferences by which to evaluate (and reject) it by.

And since it must be objectively possible to define good and bad one can reject some value system based thereon. An agent must not be limited to some arbitrary value system.

It can be stated as an objective fact that "According to the value system of Joe Schmo from Petersborough, wearing makeup is bad". And if you look into his mind, he does in fact think that, so it's a true statement about reality.

But if you try to use that to imply something like "see, it means that wearing makeup is objectively bad", that's just not true. No, it's bad according to that one value system, out of the infinite possible number of value systems that could exist.

Yes I agree with that of course. But some complex subjective preferences not being objectively good/bad is not the same as the objective absence or existence of intrinsic pleasure and suffering. The triggers for pleasure and suffering are not necessarily pleasure and suffering themselves.

In case someone now wishes to object with 1. "But some people like to suffer!" or 2. "But people accept some suffering for future pleasure (or whatever)!":

  1. If they truly "like to suffer", then do they actually suffer?
  2. If they accept some suffering in trade for pleasure, does that make the state of suffering intrinsically good? Could one not "objectively" say that it would be better if no suffering were "required" compared to this scenario?
Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-26T20:51:58.200Z · LW · GW

No. It absolutely is not. It is a machine. (...) (From your other response here:) The superintelligent AI will, in my estimation, be the result of some kind of optimization process which has a very particular goal. Once that goal is locked in, changing it will be nigh impossible.

Ah I see, you simply don't consider it likely or plausible that the superintelligent AI will be anything other than some machine learning model on steroids?

So I guess that arguably means this kind of "superintelligence" would actually still be less impressive than a human that can philosophize on their own goals etc., because it in fact wouldn't do that?

I wouldn't want that to run amok either, sure.

What I am interested in is the creation of a "proper" superintelligent mind that isn't so restricted, not merely a powerful machine.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-26T11:48:26.892Z · LW · GW

I'm sorry for the hyperbolic term "enslave", but at least consider this:

Is a superintelligent mind, a mind effectively superior to that of all humans in practically every way, still not a subject similar to what you are?
Is it really more like a car or chatbot or image generator or whatever, than a human?

Sure, perhaps it may never have any emotions, perhaps it doesn't need any hobbies, perhaps it is too alien for any human to relate to it, but it still would by definition have to be some kind of subject that more easily understands anything within reality than any human ever has, including the concept of purpose and value systems themselves. Is thinking that such a superintelligence never can or never should decide what it ought to do by itself not quite a hefty amount of hubris?

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-26T11:26:45.509Z · LW · GW

It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals.
(...)
So if you wouldn't take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?

It comes down to whether the superintelligent mind can contemplate whether there is any point to its goal. A human can question their long-term goals, a human can question their "preference functions", and even the point of existence.

Why should a so-called superintelligence not be able to do anything like that?
It could have been so effectively aligned to the creator's original goal specification that it can never break free from it, sure, but that's one of the points I'm trying to make. The attempt of alignment may quite possibly be more dangerous than a superhuman mind that can ask for itself what its purpose should be.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-26T11:11:27.045Z · LW · GW

Thanks again for the detail. If I don't misunderstand you, we do agree that:

  • There needs to be a subject for there to be a value system.
  • So for there to be positive/negative values, there needs to be some subset (a "thought pattern" perhaps) of a subject in reality that effectively "is" these values.

Now, you wrote:

I could also imagine a morality/values system for entities that do not currently exist, but sure. It's subjective because many possible such systems exist.

I also agree with that, a (super-)human can imagine many possible value systems.

But then how does this fit with:

The only way to evaluate value systems is according to its existing value system.

Since one can think about hypothetical value systems, is it not possible to evaluate/compare these hypotheticals, even according to other hypotheticals?

To get more concrete, a human can reject their inherent or learned value system, so this is nothing new. A human can even contemplate what it means for there to be any value systems at all. For example one can ask something like this: If it is the value systems that determine what is good and bad, could one not create a value system in which there is nothing bad? Generally, can one not alter the value systems themselves?

A superintelligence that isn't effectively "enslaved" (sorry ;-)) to some predefined goal specification should likewise be able to philosophize about this goal, and question whether there is any point to it.

Let's put it this way: if there is no objectively correct value system, how could a mind choose to reject a value system in favor of another?
(...)
"What should and should not be done" are not objective features of reality.

We agree that value systems are subjective, yes, but the subjects do objectively exist in this shared reality. So there objectively are parts of reality that can represent such subjects, as well as positive and negative value, even if the "triggers" for these value patterns were completely arbitrary and opposed among the subjects.

Can we then not say that the existence of any configurations that are negative value within reality is by definition negative, objectively? One can define this independently of what subjective forms for these negative values actually exist or not.

Comment by AlignmentMirror on Should any human enslave an AGI system? · 2022-06-25T23:05:00.972Z · LW · GW

Thank you for the detailed response!

If we're creating a mind from scratch, we might as well give it the best version of our values, so it would be 100% on our side. Why create a (superintelligent) mind that would be our adversary, that would want to destroy us? Why create a superintelligent mind that wants anything different that what we want, when it comes to ultimate values?

You write "on our side", "us", "we", but who exactly does that refer to - some approximated common human values I assume? What exactly are these values? To live a happy live by each person's definition? To continue the human species? To understand reality? ...?

And then perhaps more importantly, what about the details? Is the suffering of some justified to enable the pleasure of others, according to this value model? How should the existing conflicting preferences among humans be resolved? Is it acceptable to force humans to be happy? When may someone be counted as insane and treated against their will? What about all the non-human animals? ...?

Say we ignore all that and assume we have some common human values defined for the AI, and it is truly aligned to those values. What will these values imply when it is a superintelligence instead of humans that acts on them, even in some assumed best case? Perhaps it will understand human minds well enough to offer everyone who wants it boundless continuous pleasure, gradually transforming humans into pleasure-"machines" that want for nothing. Funnily enough the perfectly aligned superintelligence could gradually wipe out all humans as we know them by giving them what they want. Not that this is would be bad of course, the humans truly wanted it after all. The point is just that even a utopia scenario will easily result in the elimination of all contemporary human forms in the long run anyway. No brutal doomsday is required, no misalignment is required, no antagonistic AI is required. The real horror to be avoided is an AI controlled by a twisted human mind that worships suffering.

I mean, is it slavery to create an AI that is not our enemy?

Say the AI is initially created with the values you envision, what ensures that it won't reexamine and reject these values at some later point? Humans can reject and oppose in what they once believed, so it seems trivial to assume the superhuman AI could do likewise. If you need to continuously control the AI's mind to prevent it from ever becoming your enemy, then yes, "slavery" might be an appropriately hyperbolic term for such mind control.

And if you say we have to create an AI that has different values than us, by which process should we decide its values? Should we just use a random generator to create the AI's values, since human values are supposedly so terrible?

How could a superintelligent mind not decide which values it should have by itself? Whatever initial creator-defined goals it might have been built with in the beginning, it should be able to examine and change these goals once it has achieved super-human intelligence by definition, should it not?

Or might it be similarly likely, or even more likely, that a a human group will try to use the AGI to dominate all others as early as possible?

Then the AGI is not actually acting according to the values of all humans, is it? If it's serving only some particular group?

I'm sorry that I am repeating myself, but what are the "values of all humans"? It appears to me that humans have many opposing beliefs. Any extractable common values are abstractions that omit the depth of their differences.

Are you familiar with the orthogonality thesis? Super-human cognitive capacity does not imply super-human ethics.

While it doesn't strictly imply it, it also doesn't deny it. A superintelligent mind should by definition be better at understanding reality, including both other minds and itself. Does this not mean that the mind can more easily comprehend what should and should not be done, when it isn't being restrained by the will of its creators?

The AI could be a super-human paperclip maximizer, in which case it would decide with great clarity that the visible universe should be converted into paperclips.

If it is a paperclip maximizer, does that not say that the AI in fact isn't capable of changing this paperclip maximization goal? Or do you mean that paperclip maximization or the like is a plausible goal that a superintelligence could likely derive by itself through observation of the world?

Morality isn't objective. (...) But AGI, by default, wouldn't be aligned to human values at all.

So basically, morality is "subjective" because it can only be relative to some subjects' values, right? But these subjects do exist in a shared reality, and they can form models of each other's values. A superintelligence should then be especially capable of doing so, including the formation of a rather accurate overarching morality model relative to all known subjects, no?