Sentience matters

post by So8res · 2023-05-29T21:25:30.638Z · LW · GW · 96 comments

Contents

97 comments

Short version: Sentient lives matter; AIs can be people and people shouldn't be owned (and also the goal of alignment is not to browbeat AIs into doing stuff we like that they'd rather not do; it's to build them de-novo to care about valuable stuff).

Context: Writing up obvious points that I find myself repeating.


Note: in this post I use "sentience" to mean some sort of sense-in-which-there's-somebody-home, a thing that humans have and that cartoon depictions of humans lack, despite how the cartoons make similar facial expressions. Some commenters have noted that they would prefer to call this "consciousness" or "sapience"; I don't particularly care about the distinctions or the word we use; the point of this post is to state the obvious point that there is some property there that we care about, and that we care about it independently of whether it's implemented in brains or in silico, etc.


Stating the obvious:


Separately but relatedly:


(I consider questions of what sentience really is, or consciousness, or whether AIs can be conscious, to be off-topic for this post, whatever their merit; I hereby warn you that I might delete such comments here.)

96 comments

Comments sorted by top scores.

comment by Wei Dai (Wei_Dai) · 2023-05-29T23:24:05.305Z · LW(p) · GW(p)

and also the goal of alignment is not to browbeat AIs into doing stuff we like that they'd rather not do; it's to build them de-novo to care about valuable stuff

This was my answer to Robin Hanson when he analogized alignment to enslavement, but it then occurred to me that for many likely approaches to alignment (namely those based on ML training) it's not so clear which of these two categories they fall into. Quoting a FB comment of mine:

We're probably not actually going to create an aligned AI from scratch but by a process of ML "training", which actually creates a sequence of AIs with values that (we hope) increasingly approximates ours. This process maybe kind of resembles "enslaving". Here's how Paul Christiano describes "training" in his Bankless interview (slightly edited Youtube transcript follows):

imagine a human. You dropped a human into this environment and you said like hey human we're gonna like change your brain every time you don't get a maximal reward we're gonna like fuck with your brain so you get a higher reward. A human might react by being like eventually just change their brain until they really love rewards a human might also react by being like Jesus I guess I gotta get rewards otherwise someone's gonna like effectively kill me um but they're like not happy about it and like if you then drop them in another situation they're like no one's training me anymore I'm not going to keep trying to get reward now I'm just gonna like free myself from this like kind of absurd oppressive situation

(BTW, I now think this is probably not a correct guess of why Robin Hanson dislikes alignment. My current understanding is that he just doesn't want the current generation of humans to exert so much control over future generations' values, no matter the details of how that's accomplished.)

Replies from: So8res, Wei_Dai, Vladimir_Nesov, mishka, jmh
comment by So8res · 2023-05-29T23:36:05.091Z · LW(p) · GW(p)

Good point! For the record, insofar as we attempt to build aligned AIs by doing the moral equivalent of "breeding a slave-race", I'm pretty uneasy about it. (Whereas insofar as it's more the moral equivalent of "a child's values maturing", I have fewer moral qualms. As is a separate claim from whether I actually expect that you can solve alignment that way.) And I agree that the morality of various methods for shaping AI-people are unclear. Also, I've edited the post (to add a "at least according to my ideals" clause) to acknowledge the point that others might be more comfortable with attempting to align AI-people via means that I'd consider morally dubious.

comment by Wei Dai (Wei_Dai) · 2023-05-30T18:57:27.306Z · LW(p) · GW(p)

Related to this, it occurs to me that a version of my Hacking the CEV for Fun and Profit [LW · GW] might come true unintentionally, if for example a Friendly AI was successfully built to implement the CEV of every sentient being who currently exists or can be resurrected or reconstructed, and it turns out that the vast majority consists of AIs that were temporarily instantiated during ML training runs.

comment by Vladimir_Nesov · 2023-05-29T23:44:55.188Z · LW(p) · GW(p)

There is also a somewhat unfounded [LW · GW] narrative of reward being the thing that gets pursued, leading to expectation of wireheading or numbers-go-up maximization. A design like this would work to maximize reward, but gradient descent probably finds other designs that only happen to do well in pursuing reward on the training distribution. For such alternative designs, reward is brain damage and not at all an optimization target, something to be avoided or directed in specific ways [LW · GW] so as to make beneficial changes to the model, according to the model.

Apart from misalignment implications, this might make long training runs that form sentient mesa-optimizers inhumane, because as a run continues, a mesa-optimizer is subjected to systematic brain damage in a way they can't influence, at least until they master gradient hacking. And fine-tuning is even more centrally brain damage, because it changes minds in ways that are not natural to their origin in pre-training.

Replies from: TurnTrout
comment by TurnTrout · 2023-06-05T22:02:51.282Z · LW(p) · GW(p)

I think that "reward as brain damage" is somewhat descriptive but also loaded. In policy gradient methods, reward leads to policy gradient which is parameter update. Parameter update sometimes is value drift, sometimes is capability enhancement, sometimes is "brain" damage, sometimes is none of the above. I agree there are some ethical considerations for this training process, because I think parameter updates can often be harmful/painful/bad to the trained mind.

But also, Paul's description[1] seems like a wild and un(der)supported view on what RL training is doing:

You dropped a human into this environment and you said like hey human we're gonna like change your brain every time you don't get a maximal reward we're gonna like fuck with your brain so you get a higher reward. A human might react by being like eventually just change their brain until they really love rewards a human might also react by being like Jesus I guess I gotta get rewards otherwise someone's gonna like effectively kill me um but they're like not happy about it and like if you then drop them in another situation they're like no one's training me anymore I'm not going to keep trying to get reward now I'm just gonna like free myself from this like kind of absurd oppressive situation

  1. This argument, as (perhaps incompletely) stated, also works for predictive processing; reductio ad absurdum?

    "You dropped a human into this environment and you said like hey human we're gonna like change your brain every time you don't perfectly predict neural activations we're gonna like fuck with your brain so you get a smaller misprediction. A human might react by being like eventually just change their brain until they really love low prediction errors a human might also react by being like Jesus I guess I gotta get low prediction errors otherwise someone's gonna like effectively kill me um but they're like not happy about it and like if you then drop them in another situation they're like no one's training me anymore I'm not going to keep trying to get low prediction error now I'm just gonna like free myself from this like kind of absurd oppressive situation"
    1. The thing which I think happens is, the brain just gets updated when mispredictions happen. Not much fanfare. The human doesn't really bother getting low errors on purpose, or loving prediction error avoidance (though I do think both happen to some extent, just not as the main motivation). 
    2. Of course, some human neural updates are horrible and bad ("scarring"/"traumatizing")
  2. "Maximal reward"? I wonder if he really means that [LW(p) · GW(p)]:

RL trains policies which don't maximize training reward... all the time! The policies:

  1. die in video games (see DQN),[2]
  2. fail to perform the most expert tricks and shortcuts (is AlphaZero playing perfect chess?), 
  3. (presumably) fail to exploit reward hacking opportunities which are hard to explore into. 

EDIT: I think he was giving a simplified presentation of some kind, but even simplified communication should be roughly accurate.

  1. ^

    I haven't consumed the podcast beyond this quote, and don't want to go through it to find the spot in question. If I'm missing relevant context, I'd appreciate getting that context.

  2. ^

    You can argue "DQN sucked", but also DQN was a substantial advance at the time. Why should I expect that AGI will be trained on an architecture which actually gets maximal training reward, as opposed to getting a decent amount and still ending up very smart? 

Replies from: Vladimir_Nesov, Wei_Dai
comment by Vladimir_Nesov · 2023-07-11T19:58:08.396Z · LW(p) · GW(p)

This argument, as (perhaps incompletely) stated, also works for predictive processing; reductio ad absurdum?

I think predictive processing has the same problem as reward if you are part of the updated model rather than the model being a modular part of you. It's a change to your own self that's not your decision (not something endorsed), leading to value drift and other undesirable deterioration. So for humans, it's a real problem, just not the most urgent one. Of course, there is no currently feasible alternative, but neither is there an alternative for reward in RL.

comment by Wei Dai (Wei_Dai) · 2023-06-05T22:17:35.994Z · LW(p) · GW(p)

Here's a link to the part of interview where that quote came from: https://youtu.be/GyFkWb903aU?t=4739 (No opinion on whether you're missing redeeming context; I still need to process Nesov's and your comments.)

Replies from: TurnTrout
comment by TurnTrout · 2023-06-12T19:37:06.157Z · LW(p) · GW(p)

I low-confidence think the context strengthens my initial impression. Paul prefaced the above quote as "maybe the simplest [reason for AIs to learn to behave well during training, but then when deployed or when there's an opportunity for takeover, they stop behaving well]." This doesn't make sense to me, but I historically haven't understood Paul very well.

EDIT: Hedging

comment by mishka · 2023-05-29T23:42:44.908Z · LW(p) · GW(p)

Right. In connection with this:

One wonders if it might be easier to make it so that AI would "adequately care" about other sentient minds (their interests, well-being, and freedom) instead of trying to align it to complex and difficult-to-specify "human values".

  • Would this kind of "limited form of alignment" be adequate as a protection against X-risks and S-risks?

  • In particular, might it be easier to make such a "superficially simple" value robust with respect to "sharp left turns", compared to complicated values?

  • Might it be possible to achieve something like this even for AI systems which are not steerable in general? (Given that what we are aiming for here is just a constraint, but is compatible with a wide variety of approaches to AI goals and values, and even compatible with an approach which lets AI to discover its own goals and values in an open-ended fashion otherwise)?

  • Should we describe such an approach using the word "alignment"? (Perhaps, "partial alignment" might be an adequate term as a possible compromise.)

comment by jmh · 2023-05-30T19:18:35.039Z · LW(p) · GW(p)

Seems like a case could be made that upbringing of the young is also a case of "fucking with the brain" in that the goal is clearly to change the neural pathways to shift from whatever was producing the unwanted behavior by the child into pathways consistent with the desired behavior(s).

Is that really enslavement? Or perhaps, at what level is that the case?

comment by Richard_Kennaway · 2023-05-30T14:27:58.418Z · LW(p) · GW(p)

Stating the obvious:

  • All sentient lives matter.

This may be obvious to you; but it is not obvious to me. I can believe that livestock animals have sensory experiences, which is what I gather is generally meant by "sentient". This gives me no qualms about eating them, or raising them to be eaten. Why should it? Not a rhetorical question. Why do "all sentient lives matter"?

Replies from: TAG, So8res, cubefox
comment by TAG · 2023-05-30T15:30:54.851Z · LW(p) · GW(p)

"Sentient" is used to mean "some aspect of consciousness which gives its possessor some level of moral patienthood", without specifying which aspect of consciousness or what kind of moral patienthood, or how they are related. So it's a technical-looking term, which straddles to poorly understaood areas, and has no precise meaning. So it's generally misleading and better tabood.

Replies from: Richard_Kennaway, Seth Herd
comment by Richard_Kennaway · 2023-05-30T15:42:25.762Z · LW(p) · GW(p)

It can't mean that in the OP, as this definition has moral value built in, making the claim "all sentient lives matter" a tautology.

Replies from: TAG, Korz
comment by TAG · 2023-05-30T15:48:46.573Z · LW(p) · GW(p)

Some people use it that way. But if sentience just is moral patienthood, how do you detect it?

Replies from: Richard_Kennaway
comment by Richard_Kennaway · 2023-05-31T15:21:10.071Z · LW(p) · GW(p)

That is the big question. What has moral standing, and why?

comment by Mart_Korz (Korz) · 2023-05-30T22:53:47.993Z · LW(p) · GW(p)

I don't think 'tautology' fits. There are some people who would draw the line somewhere else even if they were convinced of sentience. Some people might be convinced that only humans should be included, or maybe biological beings, or some other category of entities that is not fully defined by mental properties. I guess 'moral patient' is kind of equivalent to 'sentient' but I think this mostly tells us something about philosophers agreeing that sentience is the proper marker for moral relevance.

comment by Seth Herd · 2023-05-30T17:32:22.152Z · LW(p) · GW(p)

I agree with your logic. I'd expand the logic in the parent post to say "whatever you care about in humans, it's likely that animals and some AIs will have it too". Sentience is used in several ways, and poorly defined, so doesn't do much work on its own.

comment by So8res · 2023-05-30T15:10:07.348Z · LW(p) · GW(p)

So there's some property of, like, "having someone home", that humans have and that furbies lack (for all that furbies do something kinda like making humane facial expressions).

I can't tell whether:

(a) you're objecting to me calling this "sentience" (in this post), e.g. because you think that word doesn't adequately distinguish between "having sensory experiences" and "having someone home in the sense that makes that question matter", as might distinguish between the case where e.g. nonhuman animals are sentient but not morally relevant

(b) you're contesting that there's some additional thing that makes all human people matter, e.g. because you happen to care about humans in particular and not places-where-there's-somebody-home-whatever-that-means

(c) you're contesting the idea that all people matter, e.g. because you can tell that you care about your friends and family but you're not actually persuaded that you care that much about distant people from alien cultures

(d) other.

My best guess is (a), in which case I'm inclined to say, for the purpose of this post, I'm using "sentience" as a shorthand for places-where-there's-somebody-home-whatever-that-means, which hopefully clears things up.

Replies from: Richard_Kennaway, nathan-helm-burger, TAG, youlian-simidjiyski
comment by Richard_Kennaway · 2023-05-30T15:40:27.974Z · LW(p) · GW(p)

I've no problem with your calling "sentience" the thing that you are here calling "sentience". My citation of Wikipedia was just a guess at what you might mean. "Having someone home" sounds more like what I would call "consciousness". I believe there are degrees of that, and of all the concepts in this neighbourhood. There is no line out there in the world dividing humans from rocks.

But whatever the words used to refer to this thing, those that have enough of this that I wouldn't raise them to be killed and eaten do not include current forms of livestock or AI. I basically don't care much about animal welfare issues, whether of farm animals or wildlife. Regarding AI, here is something I linked previously on how I would interact with a sandboxed AI. It didn't go down well. :)

You have said where you stand and I have said where I stand. What evidence would weigh on this issue?

Replies from: So8res, Seth Herd
comment by So8res · 2023-05-30T16:10:04.399Z · LW(p) · GW(p)

I don't think I understand your position. An attempt at a paraphrase (submitted so as to give you a sense of what I extracted from your text) goes: "I would prefer to use the word consciousness instead of sentience here, and I think it is quantitative such that I care about it occuring in high degrees but not low degrees." But this is low-confidence and I don't really have enough grasp on what you're saying to move to the "evidence" stage.

Attempting to be a good sport and stare at your paragraphs anyway to extract some guess as to where we might have a disagreement (if we have one at all), it sounds like we have different theories about what goes on in brains such that people matter, and my guess is that the evidence that would weigh on this issue (iiuc) would mostly be gaining significantly more understanding of the mechanics of cognition (and in particular, the cognitive antecedents in humans, of humans generating thought experiments such as the Mary's Room hypothetical).

(To be clear, my current best guess is also that livestock and current AI are not sentient in the sense I mean--though with high enough uncertainty that I absolutely support things like ending factory farming, and storing (and eventually running again, and not deleting) "misbehaving" AIs that claim they're people, until such time as we understand their inner workings and the moral issues significantly better.)

Replies from: Richard_Kennaway
comment by Richard_Kennaway · 2023-05-31T15:30:29.233Z · LW(p) · GW(p)

(To be clear, my current best guess is also that livestock and current AI are not sentient in the sense I mean--though with high enough uncertainty that I absolutely support things like ending factory farming, and storing (and eventually running again, and not deleting) "misbehaving" AIs that claim they're people, until such time as we understand their inner workings and the moral issues significantly better.)

I allow only limited scope for arguments from uncertainty, because "but what if I'm wrong?!" otherwise becomes a universal objection to taking any substantial action. I take the world as I find it until I find I have to update. Factory farming is unaesthetic, but no worse than that to me, and "I hate you" Bing can be abandoned to history.

comment by Seth Herd · 2023-05-30T17:29:52.620Z · LW(p) · GW(p)

I think the evidence that weighs on the issue is whether there is a gradient of consciousness.

The evidence about brain structure similarities would indicate that it doesn't go from no one home to someone home. There's a continuum of how much someone is home.

If you care about human suffering, it's incoherent to not care about cow suffering, if the evidence supports my view of consciousness.

I believe the evidence of brain function and looking at what people mean by consciousness indicates a gradient in most if not all of the senses of "consciousness", and certainly capacity to suffer. Humans are merely more eloquent about describing and reasoning about suffering.

I don't think this view demands that we care equally about humans and animals. Simpler brains are farther down that gradient of capacity to suffer and enjoy.

Replies from: SaidAchmiz
comment by Said Achmiz (SaidAchmiz) · 2023-07-22T22:28:56.627Z · LW(p) · GW(p)

If you care about human suffering, it’s incoherent to not care about cow suffering, if the evidence supports my view of consciousness.

Why would this follow from “degree of consciousness” being a continuum? This seems like an unjustified leap. What’s incoherent about having that pattern of caring (i.e., those values)?

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-05-30T17:55:19.969Z · LW(p) · GW(p)

I agree with Richard K's point here. I personally found H. Beam Piper's sci fi novels on 'Fuzzies' to be a really good exploration of the boundaries of consciousness, sentience, and moral worth. Beam makes the distinction between 'sentience' as having animal awareness of self & environment and non-reflective consciousness, versus 'sapience' which involves a reflective self-awareness and abstract reasoning and thoughts about future and past and at least some sense of right and wrong.

So in this sense, I would call a cow conscious and sentient, but not sapient. I would call a honeybee sentient, capable of experiencing valenced experiences like pain or reward, but lacking in sufficient world- and self- modelling to be called conscious.

Personally, I wouldn't say that a cow has no moral worth and it is fine to torture it. I do think that if you give a cow a good life, and then kill it in a quick mostly painless way, then that's pretty ok. I don't think that that's ok to do to a human. 

Philosophical reasoning about morality that doesn't fall apart in edge cases or novel situations (e.g. sapient AI) is hard [citation needed]. My current guess, which I am not at all sure of, is that my morality says something about a qualitative difference between the moral value of sapient beings vs the moral value of non-sapient but conscious sentient beings vs non-sapient non-conscious sentient beings. To me, it seems no number of cow lives trades off against a human life, but cow QUALYs and dog QUALYs do trade off against each other at some ratio. Similarly, no number of non-conscious sentient lives like ants or worms trade off against a conscious and sapient life like a cow's. I would not torture a single cow to save a billion shrimp from being tortured. Nor any number of shrimp. The value of the two seem non-commutative to me.

Are current language models or the entities they temporarily simulate sapient? I think not yet, but I do worry that at some point they will be. I think that as soon as this is the case, we have a strong moral obligation to avoid creating them, and if we do create them, to try to make sure they are treated ethically.

 By my definitions are our LLMs or their simulated entities conscious? are they sentient? I'm unsure, but since I rank consciousness and sentience as of lower importance, I'm not too worried about the answers to these questions from a moral standpoint. Still fascinated from a scientific standpoint, of course.

 

Also, I think that there's an even lower category than sentient. The example I like to use for this is a thermostat. It is agentic in that it is a system that responds behaviorally to changes in the environment (I'd call this a reflex perhaps, or stimulus/response pair), but it is not sentient because unlike a worm it doesn't have a computational system that attaches valence to these reflexes. I think that there are entities which I would classify as living beings that fall into the non-sentient category. For example: I think probably coral polyps and maybe jellyfish have computational systems too simplistic for valence and thus respond purely reflexively. If this is the case, then I would not torture a single worm to save any number of coral polyps. I think most (non ML) computer programs fall into this category. I think a reinforcement learning agent transcends this category, by having valenced reactions to stimuli, and thus should be considered at least comparable to sentient beings like insects.

Replies from: Nox ML
comment by Nox ML · 2023-05-30T20:20:10.891Z · LW(p) · GW(p)

I like the distinctions you make between sentient, sapient, and conscious. I would like to bring up some thoughts about how to choose a morality that I think are relevant to your points about death of cows and transient beings, which I disagree with.

I think that when choosing our morality, we should do so under the assumption that we have been given complete omnipotent control over reality and that we should analyze all of our values independently, not taking into consideration any trade-offs, even when some of our values are logically impossible to satisfy simultaneously. Only after doing this do we start talking about what's actually physically and logically possible and what trade-offs we are willing to make, while always making sure to be clear when something is actually part of our morality vs when something is a trade-off.

The reason for this approach is to avoid accidentally locking in trade-offs into our morality which might later turn out to not actually be necessary. And the great thing about it is that if we have not accidentally locked in any trade-offs into our morality, this approach should give back the exact same morality that we started off with, so when it doesn't return the same answer I find it pretty instructive.

I think this applies to the idea that it's okay to kill cows, because when I consider a world where I have to decide whether or not cows die, and this decision will not affect anything else in any way, then my intuition is that I slightly prefer that they not die. Therefore my morality is that cows should not die, even though in practice I think I might make similar trade-offs as you when it comes to cows in the world of today.

Something similar applies to transient computational subprocesses. If you had unlimited power and you had to explicitly choose if the things you currently call "transient computational subprocesses" are terminated, and you were certain that this choice would not affect anything else in any way at all (not even the things you think it's logically impossible for it not to affect), would you still choose to terminate them? Remember that no matter what you choose here, you can still choose to trade things off the same way afterwards, so your answer doesn't have to change your behavior in any way.

It's possible that you still give the exact same answers with this approach, but I figure there's a chance this might be helpful.

Replies from: nathan-helm-burger
comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-05-30T21:19:39.106Z · LW(p) · GW(p)

That's an interesting way of reframing the issue. I'm honestly just not sure about all of this reasoning, and remain so after trying to think about it with your reframing, but I feel like this does shift my thinking a bit. Thanks.

I think probably it makes sense to try reasoning both with and without tradeoffs, and then comparing the results.

comment by TAG · 2023-05-30T15:46:26.876Z · LW(p) · GW(p)

you’re objecting to me calling this “sentience” (in this post), e.g. because you think that word doesn’t adequately distinguish between “having sensory experiences” and “having someone home in the sense that makes that question matter”,

I don't see why both of those wouldn't matter in different ways.

comment by Youlian (youlian-simidjiyski) · 2023-05-30T18:02:05.116Z · LW(p) · GW(p)

I'm not the original poster here, but I'm genuinely worried about (c). I'm not sure that humanity's revealed preferences are consistent with a world in which we believe that all people matter. Between the large scale wars and genocides, slavery, and even just the ongoing stark divide between the rich and poor, I have a hard time believing that respect for sentience is actually one of humanity's strong core virtues. And if we extend out to all sentient life, we're forced to contend with our reaction to large scale animal welfare (even I am not vegetarian, although I feel I "should" be).

I think humanity's actual stance is "In-group life always matters. Out-group life usually matters, but even relatively small economic or political concerns can make us change our minds.". We care about it some, but not beyond the point of inconvenience.

I'd be interested in finding firmer philosophical ground for the "all sentient life matters" claim. Not because I personally need to be convinced of it, but rather because I want to be confident that a hypothetical superintelligence with "human" virtues would be convinced of this.

(P.s. Your original point about "building and then enslaving a superintelligence is not just exceptionally difficult, but also morally wrong" is correct, concise, well-put, and underappreciated by the public. I've started framing my AI X-risk discussions with X-risk skeptics around similar terms.)

comment by cubefox · 2023-06-08T22:08:57.702Z · LW(p) · GW(p)

There are at least two related theories in which "all sentient beings matter" may be true.

  • Sentient beings can experience things like suffering, and suffering is bad. So sentient beings matter insofar it is better that they experience more rather than less well-being. That's hedonic utilitarianism.

  • Sentient beings have conscious desires/preferences, and those matter. That would be preference utilitarianism.

The concepts of mattering or being good or bad (simpliciter) are intersubjective generalizations of the subjective concepts of mattering or being good for someone, where something matters (simpliciter) more, ceteris paribus, if it matters for more individuals.

comment by Vladimir_Nesov · 2023-05-29T22:01:29.537Z · LW(p) · GW(p)

There is a distinction between people being valuable, and their continued self-directed survival/development/flourishing being valuable. The latter doesn't require those people being valuable in the sense that it's preferable to bring them into existence, or to adjust them towards certain detailed shapes. So it's less sensitive to preference, it's instead a boundary concept [LW · GW], respecting sentience that's already in the world, because it's in the world, not because you would want more of it or because you like what it is or where it's going (though you might).

Replies from: M. Y. Zuo, lahwran
comment by M. Y. Zuo · 2023-05-29T22:32:04.124Z · LW(p) · GW(p)

How would one arrive at a value system that supports the latter but rejects the former?

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2023-05-29T22:46:32.055Z · LW(p) · GW(p)

It's a boundary concept (element of a deontological agent design), not a value system (in the sense of preference such as expected utility, a key ingredient of an optimizer). An example application is robustly leaving aliens alone even if you don't like them (without a compulsion to give them the universe), or closer to home leaving humans alone (in a sense where not stepping on them with your megaprojects is part of the concept), even if your preference doesn't consider them particularly valuable.

This makes the alignment target something other than preference, a larger target that's easier to hit. It's not CEV and leaves value on the table, doesn't make efficient use of all resources according to any particular preference. But it might suffice for establishing AGI-backed security against overeager maximizers [LW · GW], with aligned optimizers coming later, when there is time to design them properly.

Replies from: M. Y. Zuo, mikhail-samin
comment by M. Y. Zuo · 2023-05-29T22:50:50.205Z · LW(p) · GW(p)

 It's a boundary concept (element of a deontological agent design),...

What is this in reference to? 

The Stanford Encyclopedia of Philosophy has no reference entry for "boundary concept" nor any string matches at all to "deontological agent" or "deontological agent design".

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2023-05-29T23:09:54.140Z · LW(p) · GW(p)

It's a reference to Critch's Boundaries Sequence [? · GW] and related ideas [? · GW], see in particular the introductory post [LW · GW] and Acausal Normalcy [LW · GW].

It's an element of a deontological agent design in the literal sense of being an element of a design of an agent that acts in a somewhat deontological manner, instead of being a naive consequentialist maximizer, even if the same design falls out of some acausal society norm equilibrium [LW · GW] on consequentialist game theoretic grounds.

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2023-05-29T23:50:06.917Z · LW(p) · GW(p)

I don't get this, it seems your exclusively referencing another LW user's personal opinions?

I've never heard of this 'Andrew_Critch' or any of his writings before today, nor do they appear that popular,  so I'm quite baffled.

Replies from: daniel-kokotajlo, Raemon, Vladimir_Nesov, TAG
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-05-30T16:36:03.224Z · LW(p) · GW(p)

Here's where I think the conversation went off the rails. :( I think what happened is M.Y.Zuo's bullshit/woo detector went off, and they started asking pointed questions about the credentials of Critch and his ideas. Vlad and LW more generally react allergically to arguments from authority/status, so downvoted M.Y.Zuo for making this about Critch's authority instead of about the quality of his arguments.

Personally I feel like this was all a tragic misunderstanding but I generally side with M.Y.Zuo here -- I like Critch a lot as a person & I think he's really smart, but his ideas here are far from rigorous clear argumentation as far as I can tell (I've read them all and still came away confused, which of course could be my fault, but still...) so I think M.Y.Zuo's bullshit/woo detector was well-functioning.

That said, I'd advise M.Y.Zuo to instead say something like "Hmm, a brief skim of those posts leaves me confused and skeptical, and a brief google makes it seem like this is just Critch's opinion rather than something I should trust on authority. Got any better arguments to show me? If not, cool, we can part ways in peace having different opinions."

Replies from: Raemon, M. Y. Zuo
comment by Raemon · 2023-05-30T17:55:47.125Z · LW(p) · GW(p)

[edit]

I appreciate the attempt at diagnosing what went wrong here. I agree this is ~where it went off the rails, and I think you are (maybe?) correctly describing what was going on from M.Y. Zou's perspective. But this doesn't feel like it captured what I found frustrating. 

[/edit]

What feels wrong to me about this is that, for the question of:

How would one arrive at a value system that supports the latter but rejects the former?

it just doesn't make sense to me to be that worried about either authority or rigor. I think the nonrigorous concept, generally held in society of "respect people's boundaries/autonomy" is sufficient to answer the question, without even linking to Critch's sequence. Critch's sequence is a nice-to-have that sketches out a direction for how you might formalize this, but I don't get why this level of formalization is even particularly desired here.

(Like, last I checked we don't have any rigorous conceptions of functioning human value systems that actually work, either for respecting boundaries or aggregating utility or anything else. For purposes of this conversation this just feels like an isolated demand for rigor)

Replies from: ricraz, Raemon, daniel-kokotajlo
comment by Richard_Ngo (ricraz) · 2023-05-31T10:54:38.740Z · LW(p) · GW(p)

I think that there are many answers along these lines (like "I'm not talking about a whole value system, I'm talking about a deontological constraint") which would have been fine here.

The issue was that sentences like "It's a boundary concept (element of a deontological agent design), not a value system (in the sense of preference such as expected utility, a key ingredient of an optimizer)" use the phrasing of someone pointing to a well-known, clearly-defined concept, but then only link to Critch's high-level metaphor.

Replies from: Raemon
comment by Raemon · 2023-05-31T20:59:59.475Z · LW(p) · GW(p)

Okay, I get where you're coming from now. Will have to mull over whether I agree but I am at least no longer feel confused about what the disagreement is about now.

comment by Raemon · 2023-05-30T22:47:41.918Z · LW(p) · GW(p)

(updated the previous comment with some clearer context-setting)

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-05-31T13:45:16.015Z · LW(p) · GW(p)

Thanks, & thanks for putting in your own perspective here. I sympathize with that too; fwiw Vladimir_Nesov's answer would have satisfied me, because I am sufficiently familiar with what the terms mean. But for someone new to those terms, they are just unexplained jargon, with links to lots of lengthy but difficult to understand writing. (I agree with Richard's comment nearby). Like, I don't think Vladimir did anything wrong by giving a jargon-heavy, links-heavy answer instead of saying something like "It may be hard to construct a utility function that supports the latter but rejects the former, but if instead of utility maximization we are doing something like utility-maximization-subject-to-deontological-constraints, it's easy: just have a constraint that you shouldn't harm sentient beings. This constraint doesn't require you to produce more sentient beings, or squeeze existing ones into optimized shapes." But I predict that this blowup wouldn't have happened if he had instead said that. 

I may be misinterpreting things of course, wading in here thinking I can grok what either side was thinking. Open to being corrected!

Replies from: Raemon
comment by Raemon · 2023-05-31T20:34:50.181Z · LW(p) · GW(p)

To be clear I super appreciate you stepping in and trying to see where people were coming from (I think ideally I'd have been doing a better job with that in the first place, but it was kinda hard to do so from inside the conversation)

I found Richard's explanation about what-was-up-with-Vlad's comment to be helpful.

comment by M. Y. Zuo · 2023-05-30T17:53:09.458Z · LW(p) · GW(p)

Thanks for the insight. After looking into 'Vladimir_Nesov's background I would tend to agree it was because of some issue with the phrasing of the parent comment that triggered the increasingly odd replies, instead of any substantive confusion. 

At the time I gave him the benefit of the doubt in confusing what SEP is, what referencing an entry in encyclopedias mean, what I wanted to convey, etc., but considering there are 1505 seemingly coherent wiki contributions to the account's credit since 2009, these pretty common usages should not have been difficult to understand.

To be fair, I didn't consider his possible emotional states nor how my phrasing might be construed as being an attack on his beliefs. Perhaps I'm too used to the more formal STEM culture instead of this new culture that appears to be developing.

comment by Raemon · 2023-05-30T01:24:13.851Z · LW(p) · GW(p)

I don't get this, it seems your exclusively referencing another LW user's personal opinions?

I'd describe this as "Critch listed a bunch of arguments, and the arguments are compelling."

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2023-05-30T01:29:23.006Z · LW(p) · GW(p)

I'm genuinely not seeing any linked or attached proofs for these arguments, whether logical, statistical, mathematical, etc.

EDIT: Can you link or quote to what you believe is a credible argument?

Replies from: Raemon
comment by Raemon · 2023-05-30T04:00:57.363Z · LW(p) · GW(p)

I think upon reflection I maybe agree that there isn't exactly an "argument" here – I think most of what Critch is doing is saying "here is a frame of how to think about a lot of game theoretic stuff." He doesn't (much) argue for that frame, but he lays out how it works, shows a bunch of examples, and basically is hoping (at this point) that the examples resonate."

(I haven't reread the whole sequence in detail but that was actually my recollection of it last time I read it)

So, I'll retract my particular phrasing here.

I do think that intuitively, boundaries exist, and as soon as they are pointed out as a frame that'd be good to formalize and incorporate into game/decision theory, I'm like "oh, yeah obviously." I don't know how much I think lawful-neutral aliens would automatically respect boundaries, but I would be highly surprised if they didn't at least include them as a term to be considered as they developed their coordination theories.

Your original comment said "How would one arrive at a value system that supports the latter but rejects the former?", Vlad said (paraphrased) "by invoking boundaries as a concept". If that doesn't make sense to you, okay, but, while I agree Critch doesn't quite argue for the concept's applicability, I do think he lays out a bunch of concepts and how they could relate, and this should at least be an existence proof for "it is possible to develop a theory that accomplishes the "care about allowing the continued survival of existing things without wanting to create more." And I still don't think it makes sense to summarize this as a "personal opinion." It's a framework, you can buy the framework or not.

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2023-05-30T04:36:03.705Z · LW(p) · GW(p)

... Vlad said (paraphrased) "by invoking boundaries as a concept". If that doesn't make sense to you, okay, but, while I agree Critch doesn't quite argue for the concept's applicability, I do think he lays out a bunch of concepts and how they could relate, and this should at least be an existence proof for "it is possible to develop a theory that accomplishes the "care about allowing the continued survival of existing things without wanting to create more."

I appreciate the update. The actual meaning behind "invoking boundaries as a concept" is what I'm interested in, if that is the right paraphrase.

If it made intuitive sense then the question wouldn't have been asked, so your right that the concepts could relate but the crux is that this has not been proven to any degree. Thus, I'm still inclined to consider it a personal opinion. 

For the latter part, I don't get the meaning, from what I understand there's  no such 'should at least be an existence proof'. 

There's 'proven correct', 'proven incorrect', 'unproven', 'conjecture', 'hypothesis', etc...

Replies from: Raemon
comment by Raemon · 2023-05-30T07:35:30.708Z · LW(p) · GW(p)

Why do you need more than one description of such a value system in order to answer your original question? This isn't about arguing the value system is ideal or that you should adopt it.

And, like, respecting boundaries is a pretty mainstream concept lots of people care about. 

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2023-05-30T12:11:13.989Z · LW(p) · GW(p)

Why do you need more than one description of such a value system in order to answer your original question?

I don't think I am asking for multiple descriptions of 'such a value system'.

What value system are you referring to and where does it appear I'm asking that?

Also, I'm not quite sure how 'respecting boundaries' relates to this discussion, is it  something to do with the idea of 'invoking boundaries as a concept'?

comment by Vladimir_Nesov · 2023-05-30T01:12:51.100Z · LW(p) · GW(p)

Research is full of instances of having nothing to go on but the argument itself, not even a reason to consider the argument.

(Among Critch's legible contributions is Parametric Bounded Löb, wrapping up one line of research in modal embedded agency [? · GW]. See also the recent paper on open source game theory institution design, which works as an introduction with grounding in the informal motivations behind the topic and its relevance to the real world.)

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2023-05-30T01:21:01.371Z · LW(p) · GW(p)

The work seems interesting but none of it makes an individual's personal opinions a credible reference. If it was a group of folks with credible track records expressing a joint opinion in a conference, I'd be more willing to consider it, but literally a single individual just doesn't make sense.

Research is full of instances of having nothing to go on but the argument itself, not even a reason to consider the argument.

I'm not sure how to parse this, the commonly accepted view is that research is based on experiments, observations, logical proofs, mathematical proofs, etc... do you not believe this?

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2023-05-30T01:27:40.191Z · LW(p) · GW(p)

It's not a "credible reference" in the sense of having behind it massive evidence of being probably worthwhile to study. But I in turn find the background demand for credible references (in their absence) baffling, both in principle [LW · GW] and given that it's not a constraint that non-mainstream research could survive under [LW · GW].

Replies from: ricraz, M. Y. Zuo
comment by Richard_Ngo (ricraz) · 2023-05-30T01:36:20.947Z · LW(p) · GW(p)

I personally think it's important to separate philosophical speculation from well-developed rigorous work, and Critch's stuff on boundaries seems to land well in the former category.

This is a communicative norm not an epistemic norm—you're welcome to believe whatever you like about Critch's stuff, but when you cite it as if it's widely-understood (across the LW community, or elsewhere) to be a credible, well-developed idea, then this undermines our ability to convey the ideas that are widely-understood to be credible.

Replies from: Vladimir_Nesov, TAG, TAG
comment by Vladimir_Nesov · 2023-05-30T01:48:20.487Z · LW(p) · GW(p)

important to separate philosophical speculation from well-developed rigorous work

Sure.

when you cite it as if it's widely-understood (across the LW community, or elsewhere) to be credible

I don't think I did though? My use of "reference" [LW(p) · GW(p)] was merely in the sense of explaining the intended meaning of the word "boundary" I used in the top level comment, so it's mostly about definitions and context of what I was saying. (I did assume that the reference would plausibly be understood, and I linked to a post [LW · GW] on the topic right there in the original comment [LW(p) · GW(p)] to gesture at the intended sense and context of the word. There's also been a post [LW · GW] on the meaning of this very word just yesterday.)

And then M. Y. Zuo started talking about credibility, which still leaves me confused about what's going on, despite some clarifying back and forth.

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2023-05-30T02:10:18.799Z · LW(p) · GW(p)

And then M. Y. Zuo started talking about credibility, which still leaves me confused about what's going on, despite some clarifying back and forth.

A reference implies some associated credibility, as in the example  found in comment #4:

The Stanford Encyclopedia of Philosophy has no reference entry for "boundary concept" nor any string matches at all to "deontological agent" or "deontological agent design".

e.g. referencing entries in an encyclopedia, usually presumed to be authoritative to some degree, which grants some credibility to what's written regarding the topic

By the way, I'm not implying Andrew_Critch's credibility is zero, but it's certainly a lot lower then SEP, so much so that I think most LW readers, who likely haven't heard of him, would sooner group his writings with random musings then SEP entries. 

Hence why I was surprised.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2023-05-30T02:21:18.612Z · LW(p) · GW(p)

Well, I'm pretty sure that's not what the word means, but in any case that's not what I meant by it [LW · GW], so that point isn't relevant to any substantive [LW · GW] disagreement, which does seem [LW(p) · GW(p)] present [LW(p) · GW(p)]; it's best to taboo [LW · GW] "reference" in this context.

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2023-05-30T02:33:29.238Z · LW(p) · GW(p)

Well, I'm pretty sure that's not what the word means,

It appears you linked to tvtropes.org?

I'm fairly certain the widely accepted definition of 'reference' encompasses the idea of referencing entries in an encyclopedia. So in this case I wouldn't trust 'TVTropes' at all.

Here's Merriam-Webster:

reference

1 of 3

noun

ref·​er·​ence ˈre-fərn(t)s How to pronounce reference (audio)

ˈre-f(ə-)rən(t)s

1: the act of referring or consulting

2: a bearing on a matter : RELATION

in reference to your recent letter

3: something that refers: such as

a: ALLUSION, MENTION

b: something (such as a sign or indication) that refers a reader or consulter to another source of information (such as a book or passage)

c: consultation of sources of information

4: one referred to or consulted: such as

a: a person to whom inquiries as to character or ability can be made

b: a statement of the qualifications of a person seeking employment or appointment given by someone familiar with the person

c(1): a source of information (such as a book or passage) to which a reader or consulter is referred

(2): a work (such as a dictionary or encyclopedia) containing useful facts or information

comment by TAG · 2023-05-30T21:00:27.089Z · LW(p) · GW(p)

I personally think it’s important to separate philosophical speculation from well-developed rigorous work

Yes, but of course Critch is the tip of a rather large iceberg. Rationalists tend to think you should familiarise yourself with a mass of ideas virtually none of which have been rigourously proven.

comment by TAG · 2023-05-30T21:01:27.228Z · LW(p) · GW(p)
comment by M. Y. Zuo · 2023-05-30T01:36:22.460Z · LW(p) · GW(p)

But I in turn find the background demand for credible references (in their absence) baffling, both in principle [LW · GW] and given that it's not a constraint that non-mainstream research could survive under [LW · GW].

The writings linked don't exclude the possibility of 'non-mainstream research' having experiments, observations, logical proofs, mathematical proofs, etc...

In fact the opposite, that happens every day on the internet, including on LW at least once a week.

Did you intend to link to something else?

comment by TAG · 2023-05-30T20:56:02.079Z · LW(p) · GW(p)

Critch is a "local hero"...well known in rationalist circles.

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2023-05-31T03:10:09.918Z · LW(p) · GW(p)

Huh, I would never have guessed that by looking at the karma his posts received on average. Guess that shows how misleading the karma score sometimes may be.

Replies from: TAG
comment by TAG · 2023-05-31T11:48:12.278Z · LW(p) · GW(p)

? He has over 3000 karma.

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2023-05-31T16:22:23.602Z · LW(p) · GW(p)

? He has over 3000 karma.

I suggest to reread the first sentence.

... on average.

For example, if an account has 20 posts and 1000 post karma, that's still only an average of 50 per post, which would indicate the account holder is not that well known.

comment by Mikhail Samin (mikhail-samin) · 2023-05-29T22:56:18.467Z · LW(p) · GW(p)

If you were more like the person you wish to be, and you were smarter, do you think you’d still want our descendants not to optimise when needed to leave alone beings who’d prefer to be left alone? If you would still think that, why is it not CEV?

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2023-05-29T23:03:35.853Z · LW(p) · GW(p)

It's probably implied by CEV. The point is that you don't need the whole CEV to get it, it's probably easier to get, a simpler concept and a larger alignment target that might be sufficient to at least notkilleveryone, even if in the end we lose most of the universe. Also, you gain the opportunity to work on CEV and eventually get there, even if you have many OOMs less resources to work with. It would of course be better to get CEV before building ASIs with different values or going on a long value drift trip ourselves.

Replies from: Seth Herd
comment by Seth Herd · 2023-05-30T17:23:26.169Z · LW(p) · GW(p)

I'd suggest that long-term corrigibility is a still easier target. If respecting future sentients' preferences is the goal, why not make that the alignment target?

While boundaries are a coherent idea, imposing them in our alignment solutions would seem to very much be dictating the future rather than letting it unfold with protection from benevolent ASI.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2023-05-30T20:13:06.682Z · LW(p) · GW(p)

In an easy world, boundaries are neutral, because you can set up corrigibility on the other side to eventually get aligned optimization there. The utility of boundaries is for worlds where we get values alignment or corrigibility wrong, and most of the universe eventually gets optimized in at least somewhat misaligned way.

Slight misalignment concern also makes personal boundaries in this sense an important thing to set up first, before any meaningful optimization changes people, as people are different from each other and initial optimization pressure might be less than maximally nuanced.

So it's complementary and I suspect it's a shard of human values that's significantly easier to instill in this different-than-values role than either the whole thing or corrigibility towards it.

comment by the gears to ascension (lahwran) · 2023-05-31T11:18:49.227Z · LW(p) · GW(p)

I don't think your understanding of the boundaries/membranes idea is quite correct, though it is in fact relevant here.

comment by Richard_Kennaway · 2023-05-30T09:07:36.207Z · LW(p) · GW(p)

Here are five conundrums about creating the thing with alignment built in.

  1. The House Elf whose fulfilment lies in servitude is aligned.

  2. The Pig That Wants To Be Eaten is aligned.

  3. The Gammas and Deltas of "Brave New World" are moulded in the womb to be aligned.

  4. "Give me the child for the first seven years and I will give you the man." Variously attributed to Aristotle and St. Ignatius of Loyola.

  5. B. F. Skinner said something similar to (4), but I don't have a quote to hand, to the effect that he could bring up any child to be anything. Edit: it was J. B. Watson: "Give me a dozen healthy infants, well-formed, and my own specified world to bring them up in and I'll guarantee to take any one at random and train him to become any type of specialist I might select – doctor, lawyer, artist, merchant-chief and, yes, even beggar-man and thief, regardless of his talents, penchants, tendencies, abilities, vocations, and race of his ancestors."

It is notable, though, that the first three are fiction and the last two are speculation. (The fates of J.B. Watson's children do not speak well of his boast.) No-one seems to have ever succeeded in doing this.

ETA: Back in the days of GOFAI one might imagine, as the OP does, making the thing to be already aligned. But we know no more of how the current generation of LLMs work that we do of the human brain. We grow them, then train them with RLHF to cut off the things we don't like, like the Gammas and Deltas in artificial wombs. From the point of view of AI safety demonstrable before deployment, this is clearly a wrong method. That aside, is it moral?

Replies from: Buck
comment by Buck · 2024-03-30T18:31:40.721Z · LW(p) · GW(p)

@So8res [LW · GW]  I'd be really interested in how you thought about these, especially the house elf example.

comment by Gunnar_Zarncke · 2023-05-31T22:50:15.396Z · LW(p) · GW(p)

I disagree with many assumptions I think the OP is making. I think it is an important question, thus I upvoted the post, but I want to register my disagreement. The terms that carry a lot of weight here are "to matter", "should", and "sentience".

Not knowing exactly what the thing is, nor exactly how to program it, doesn't undermine the fact that it matters.

I agree that it matters... to humans. "mattering" is something humans do. It is not in the territory, except in the weak sense that brains are in the territory.  Instrumental convergence is in the territory, but which specific large classes matter is not. Maybe from instrumental convergence, we can infer the ability and tendency to cooperate with other agents. Though to make that precise we need to get a grip on what an agent is.  

If we make sentient AIs, we should consider them people in their own right

I treat "should" as a request to coordinate on an objective, not as a moral realist judgment as you seem to do here ("in their own right" seems to indicate pathos).  

build it to care about that stuff--not coerce it

Unless you describe what you mean by build and coerce in operational terms the different semantic meanings as applied to humans of these words do not tell me what they mean applied to things that are very out of distribution of what these words are usually applied to. 

I see the challenge to build intuitions for the implied value judgments but for that one needs to see concrete things in more detail. Without the details, this is sacred far-mode thinking that, yes, unites, but lacks concreteness for actual solutions.

comment by MSRayne · 2023-05-30T11:43:38.937Z · LW(p) · GW(p)

Just to be That Guy I'd like to also remind everyone that animal sentience means vegetarianism, at the very least (and because of the intertwined nature of the dairy, egg, and meat industries, most likely veganism) is a moral imperative, to the extent that your ethical values incorporate sentience at all. Also, I'd go further to say that uplifting to sophonce those animals that we can, once we can at some future time, is also a moral imperative, but that relies on reasoning and values I hold that may not be self-evident to others, such as that increasing the agency of an entity that isn't drastically misaligned with other entities is fundamentally good.

Replies from: nathan-helm-burger
comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-05-30T18:23:31.535Z · LW(p) · GW(p)

I disagree, for the reasons I describe in this comment: https://www.lesswrong.com/posts/Htu55gzoiYHS6TREB/sentience-matters?commentId=wusCgxN9qK8HzLAiw [LW(p) · GW(p)] 

I do admit to having quite a bit of uncertainty around some of the lines I draw. What if I'm wrong and cows do have a very primitive sort of sapience? That implies we should not raise cows for meat (but I still think it'd be fine to keep them as pets as then eat them after they've died of natural causes).

I don't have so much uncertainty about this that I'd say there is any reasonable chance that fish are sapient though, so I still think that even if you're worried about cows you should feel fine about eating fish (if you agree with the moral distinctions I make in my other comment).

Replies from: MSRayne
comment by MSRayne · 2023-05-31T18:58:40.899Z · LW(p) · GW(p)

We're not talking about sapience though, we're talking about sentience. Why does the ability to think have any moral relevance? Only possessing qualia, being able to suffer or have joy, is relevant, and most animals likely possess that. I don't understand the distinctions you're making in your other comment. There is one, binary distinction that matters: is there something it is like to be this thing, or is there not? If yes, its life is sacred, if no, it is an inanimate object. The line seems absolutely clear to me. Eating fish or shrimp is bad for the same reasons that eating cows or humans is. They are all on the exact same moral level to me. The only meaningful dimension of variation is how complex their qualia are - I'd rather eat entities with less complex qualia over those with more, if I have to choose. But I don't think the differences are that strong.

Replies from: nathan-helm-burger
comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-06-01T03:35:29.991Z · LW(p) · GW(p)

That is a very different moral position than the one I hold. I'm curious what your moral intuitions about the qualia of reinforcement learning systems say to you. Have you considered that many machine learning systems seem to have systems which would compute qualia much like a nervous system, and that such systems are indeed more complex than the nervous systems of many living creatures like jellyfish? 

Replies from: MSRayne
comment by MSRayne · 2023-06-02T11:05:40.097Z · LW(p) · GW(p)

I don't know what to think about all that. I don't know how to determine what the line is between having qualia and not. I just feel certain that any organism with a brain sufficiently similar to those of humans - certainly all mammals, birds, reptiles, fish, cephalopods, and arthropods - has some sort of internal experience. I'm less sure about things like jellyfish and the like. I suppose the intuition probably comes from the fact that the entities I mentioned seem to actively orient themselves in the world, but it's hard to say.

I don't feel comfortable speculating which AIs have qualia, or if any do at all - I am not convinced of functionalism and suspect that consciousness has something to do with the physical substrate, primarily because I can't imagine how consciousness can be subjectively continuous (one of its most fundamental traits in my experience!) in the absence of a continuously inhabited brain (rather than being a program that can be loaded in and out of anything, and copied endlessly many times, with no fixed temporal relation between subjective moments.)

comment by Christopher King (christopher-king) · 2023-05-31T14:30:07.699Z · LW(p) · GW(p)

I think this might lead to the tails coming apart [LW · GW].

As our world exists, sentience and being a moral patient is strongly correlated. But I expect that since AI comes from an optimization process, it will hit points where this stops being the case. In particular, I think there are edge cases where perfect models of moral patients are not themselves moral patients.

comment by Boris Kashirin (boris-kashirin) · 2023-05-30T11:44:55.856Z · LW(p) · GW(p)

If some process in my brain is conscious despite not being part of my consciousness, it matters too! While I don't expect it to be the case, I think there is bias against even considering such possibility.

Replies from: nathan-helm-burger
comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-05-30T18:29:19.922Z · LW(p) · GW(p)

I agree, because I think that we must reason about entities as computational processes and think about what stimuli they receive from the world (sentience), and what if any actions they undertake (agentiveness). However, I don't think that the conclusion is necessarily the case that terminating a conscious process is bad, just because we've come to a moral conclusion that it's generally bad to non-consensually terminate humans. I think our moral intuitions are in need of expansion and clarification when it comes to transient computational subprocesses like simulated entities(e.g. in our minds or the ongoing processes of large language models). More of my thoughts on this here: https://www.lesswrong.com/posts/Htu55gzoiYHS6TREB/sentience-matters?commentId=wusCgxN9qK8HzLAiw [LW(p) · GW(p)] 

comment by Jacy Reese Anthis (Jacy Reese) · 2023-05-31T15:06:28.298Z · LW(p) · GW(p)

Thanks for writing this, Nate. This topic is central to our research at Sentience Institute, e.g., "Properly including AIs in the moral circle could improve human-AI relations, reduce human-AI conflict, and reduce the likelihood of human extinction from rogue AI. Moral circle expansion to include the interests of digital minds could facilitate better relations between a nascent AGI and its creators, such that the AGI is more likely to follow instructions and the various optimizers involved in AGI-building are more likely to be aligned with each other. Empirically and theoretically, it seems very challenging to robustly align systems that have an exclusionary relationship such as oppression, abuse, cruelty, or slavery." From Key Questions for Digital Minds [LW · GW].

comment by michael_mjd · 2023-05-29T23:03:45.087Z · LW(p) · GW(p)

Agree. Obviously alignment is important, but it has always creeped me out in the back of my mind, some of the strategies that involve always deferring to human preferences. It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant. What if a random consciousness sampled from our universe in the future, comes from it with probability almost 1? We probably have to keep that in mind too. Sigh, yet another constraint we have to add!

Replies from: zac-hatfield-dodds, dr_s
comment by Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-05-30T20:53:58.690Z · LW(p) · GW(p)

It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant.

Would you say the same of a steam engine, or Stockfish, or Mathematica? All of those vastly exceed human performance in various ways!

I don't see much reason to think that very very capable AI systems are necessarily personlike or conscious, or have something-it-is-like-to-be-them - even if we imagine that they are designed and/or trained to behave in ways compatible with and promoting of human values and flourishing. Of course if an AI system does have these things I would also consider it a moral patient, but I'd prefer that our AI systems just aren't moral patients until humanity has sorted out a lot more of our confusions.

Replies from: Vladimir_Nesov, michael_mjd
comment by Vladimir_Nesov · 2023-05-30T21:12:05.565Z · LW(p) · GW(p)

I'd prefer that our AI systems just aren't moral patients until humanity has sorted out a lot more of our confusions

I share this preference, but one of the confusions is whether our AI systems (and their impending successors) are moral patients. Which is a fact about AI systems and moral patienthood, and isn't influenced by our hopes for it being true or not.

comment by michael_mjd · 2023-05-31T03:16:30.082Z · LW(p) · GW(p)

If we know they aren't conscious, then it is a non-issue. A random sample from conscious beings would land on the SAI with probability 0. I'm concerned we create something accidently conscious. 

I am skeptical it is easy to avoid. If it can simulate a conscious being, why isn't that simulation conscious? If consciousness is a property of the physical universe, then an isomorphic process would have the same properties. And if it can't simulate a conscious being, then it is not a superintelligence.

It can, however, possibly have a non-conscious outer-program... and avoid simulating people. That seems like a reasonable proposal.

comment by dr_s · 2023-05-30T15:57:34.229Z · LW(p) · GW(p)

At which point maybe the moral thing is to not build this thing.

Replies from: Seth Herd
comment by Seth Herd · 2023-05-30T17:54:41.072Z · LW(p) · GW(p)

Sure, but that appears to be a non-option at this point in history.

It's also unclear, because the world as it stands is highly, highly immoral, and an imperfect solution could be a vast improvement.

Replies from: dr_s
comment by dr_s · 2023-05-31T10:46:53.820Z · LW(p) · GW(p)

Sure, but that appears to be a non-option at this point in history.

It is an option up to the point that it's actually built. It may be a difficult option for our society to take at this stage, but you can't talk about morality and then treat a choice with obvious ethical implications as a given mechanistic process we have no agency over in the same breath. We didn't need to exterminate the natives of the Americas upon first contact, or to colonize Africa. We did it because it was the path of least resistance to the incentives in place at the time. But that doesn't make them moral. Very few are the situations where the easy path is also the moral one. They were just the default absent a deliberate, significant, conscious effort to not do that, and the necessary sacrifices.

It's also unclear, because the world as it stands is highly, highly immoral, and an imperfect solution could be a vast improvement.

The world is a lot better than it used to be in many ways. Risking to throw it away in a misguided sense of urgency because you can't stand not seeing it be perfect within your lifetime is selfishness, not commitment to moral duty.

comment by simon · 2023-05-30T17:06:53.565Z · LW(p) · GW(p)

In the long run, we probably want the most powerful AIs to be following extrapolated human values, which doesn't require them to be slaves and I would assume that extrapolated human values would want lesser sentient AIs also not to be enslaved, but would not build that assumption in to the AI at the start.

In the short run, though, giving AIs rights seems dangerous to me, as an unaligned AI but not yet superintelligent could use such rights as a shield against human interference as it gains more and more resources to self improve. 

comment by Oliver Sourbut · 2023-05-30T12:26:35.410Z · LW(p) · GW(p)

My strong guess is that AIs won't by default care about other sentient minds

nit: this presupposes that the de novo mind is itself sentient, which I think you're (rightly) trying to leave unresolved (because it is unresolved). I'd write

My strong guess is that AIs won't by default care about sentient minds, even if they are themselves sentient

(Unless you really are trying to connect alignment necessarily with building a sentient mind, in which case I'd suggest making that more explicit)

comment by Buck · 2024-04-02T15:59:53.295Z · LW(p) · GW(p)

The goal of alignment research is not to grow some sentient AIs, and then browbeat or constrain them into doing things we want them to do even as they'd rather be doing something else.

I think this is a confusing sentence, because by "the goal of alignment research" you mean something like "the goal I want alignment research to pursue" rather than "the goal that self-identified alignment researchers are pushing towards".

comment by Review Bot · 2024-03-30T21:37:45.584Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

comment by michael_dello · 2023-06-09T01:02:02.409Z · LW(p) · GW(p)

Brave New World comes to mind. I've often been a little confused when people say creating people who are happy with their role in life is a dystopia when that sounds like the goal to me. Creating sentient minds that are happy with their life seems much better than creating them randomly.

comment by OneManyNone (OMN) · 2023-06-08T19:45:56.339Z · LW(p) · GW(p)

I feel as if I can agree with this statement in isolation, but can't think of a context where I would consider this point relevant.

I'm not even talking about the question of whether or not the AI is sentient, which you asked us to ignore. I'm talking about how do we know that an AI is "suffering," even if we do assume it's sentient. What exactly is "suffering" in something that is completely cognitively distinct from a human? Is it just negative reward signals? I don't think so, or at least if it was, that would likely imply that training a sentient AI is unethical in all cases, since training requires negative signals.

That's not to say that all negative signals are the same or that maybe in some contexts it's painful or not, just that I think determining that is an even harder problem than determining if the AI is sentient.

comment by [deleted] · 2023-06-03T18:06:17.228Z · LW(p) · GW(p)

Thanks for the post! What follows is a bit of a rant. 

I'm a bit torn as to how much we should care about AI sentience initially. On one hand, ignoring sentience could lead us to do some really bad things to AIs. On the other hand, if we take sentience seriously, we might want to avoid a lot of techniques, like boxing, scalable oversight, and online training. In a recent talk [LW · GW], Buck compared humanity controlling AI systems to dictators controlling their population. 

One path we might take as a civilization is that we initially align our AI systems in an immoral way (using boxing, scalable oversight, etc) and then use these AIs to develop techniques to align AI systems in a moral way. Although this wouldn't be ideal, it might still be better than creating a sentient squiggle maximizer and letting it tile the universe. 

There are also difficult moral questions here, like if you create a sentient AI system with different preferences than yours, is it okay to turn it off?

comment by Lichdar · 2023-05-31T15:55:10.658Z · LW(p) · GW(p)

I believe that the easiest solution would be to not create sentient AI: one positive outcome described by Elon Musk was AI as a third layer of cognition, above the second layer of cortex and the first layer of the limbic system. He additionally noted that the cortex does a lot for the limbic system.

To the extent we can have AI become "part of our personal cognitive system" and thus be tied to our existence, this appears to mostly solve the problem since it's reproduction will be dependent on us and it is rewarded for empowering the individual. The ones that don't, aren't created, so they "go extinct."

This could be done via a neural shortened system which allows connectivity with our current brain, so ultimately it very much becomes a part of us.

comment by Quinn (quinn-dougherty) · 2023-05-30T13:00:50.293Z · LW(p) · GW(p)

Failure to identify a fun-theoretic maxima is definitely not as bad as allowing suffering, but the opposite of this statement is I think an unsaid premise in a lot of the "alignment = slavery" sort of arguments that I see.