Which values are stable under ontology shifts?

ricraz

Which values are stable under ontology shifts?

post by Richard_Ngo (ricraz) · 2022-07-23T02:40:04.344Z · LW · GW · 48 comments

48 comments

Here’s a rough argument which I’ve been thinking about lately:

We have coherence theorems which say that, if you’re not acting like you’re maximizing expected utility over outcomes, you’d make payments which predictably lose you money. But in general I don't see any a principled distinction between “predictably losing money” (which we see as incoherent) and “predictably spending money” (to fulfill your values): it depends on the space of outcomes over which you define utilities, which seems pretty arbitrary. You could interpret an agent being money-pumped as a type of incoherence, or as an indication that it enjoys betting and is willing to pay to do so; similarly you could interpret an agent passing up a “sure thing” bet as incoherence, or just a preference for not betting which it’s willing to forgo money to satisfy. Many humans have one of these preferences!

Now, these preferences are somewhat odd ones, because you can think of every action under uncertainty as a type of bet. In other words, “betting” isn't a very fundamental category in an ontology which has a sophisticated understanding of reasoning under uncertainty. Then the obvious follow-up question is: which human values will naturally fit into much more sophisticated ontologies [LW · GW]? I worry that not many of them will:

In a world where minds can be easily copied, our current concepts of personal identity and personal survival will seem very strange. You could think of those values as “predictably losing money” by forgoing the benefits of temporarily running multiple copies. (This argument was inspired by this old thought experiment [LW · GW] from Wei Dai.)
In a world where minds can be designed with arbitrary preferences, our values related to “preference satisfaction” will seem very strange, because it’d be easy to create people with meaningless preferences that are by default satisfied to an arbitrary extent.
In a world where we understand minds very well, our current concepts of happiness and wellbeing may seem very strange. In particular, if happiness is understood in a more sophisticated ontology as caused by positive reward prediction error, then happiness is intrinsically in tension with having accurate beliefs. And if we understand reward prediction error in terms of updates to our policy, then deliberately invoking happiness would be in tension with acting effectively in the world.
- If there's simply a tradeoff between them, we might still want to sacrifice accurate beliefs and effective action for happiness. But what I'm gesturing towards is the idea that happiness might not actually be a concept which makes much sense given a complete understanding of minds - as implied by the buddhist view of happiness as an illusion, for example.
In a world where people can predictably influence the values of their far future descendants, and there’s predictable large-scale growth, any non-zero discounting will seem very strange, because it predictably forgoes orders of magnitude more resources in the future.
- This might result in the strategy described by Carl Shulman of utilitarian agents mimicking selfish agents by spreading out across the universe as fast as they can to get as many resources as they can, and only using those resources to produce welfare once the returns to further expansion are very low. It does seem possible that we design AIs which spend millions or billions of years optimizing purely for resource acquisition, and then eventually use all those resources for doing something entirely different. But it seems like those AIs would need to have minds that are constructed in a very specific and complicated way to retain terminal values which are so unrelated to most of their actions.

A more general version of these arguments: human values are generalizations of learned heuristics for satisfying innate drives, which in turn are evolved proxies for maximizing genetic fitness. In theory, you can say “this originated as a heuristic/proxy, but I terminally value it”. But in practice, heuristics tend to be limited, messy concepts which don't hold up well under ontology improvement. So they're often hard to continue caring about once you deeply understand them - kinda like how it’s hard to endorse “not betting” as a value once you realize that everything is a kind of bet, or endorse faith in god as a value if you no longer believe that god exists. And they're especially hard to continue caring about at scale.

Given all of this, how might future values play out? Here are four salient possibilities:

Some core notion of happiness/conscious wellbeing/living a flourishing life is sufficiently “fundamental” that it persists even once we have a very sophisticated understanding of how minds work.
No such intuitive notions are strongly fundamental, but we decide to ignore that fact, and optimize for values that seem incoherent to more intelligent minds. We could think of this as a way of trading away the value of consistency.
We end up mainly valuing something like “creating as many similar minds as possible” for its own sake, as the best extrapolation of what our other values are proxies for.
We end up mainly valuing highly complex concepts which we can’t simplify very easily - like “the survival and flourishing of humanity”, as separate from the survival and flourishing of any individual human. In this world, asking whether an outcome is good for individuals might feel like asking whether human actions are good or bad for individual cells - even if we can sometimes come up with a semi-coherent answer, that’s not something we care about very much.

48 comments

Comments sorted by top scores.

comment by Thane Ruthenis · 2022-07-23T05:03:49.023Z · LW(p) · GW(p)

So they're often hard to continue caring about once you deeply understand them - kinda like how it’s hard to endorse “not betting” as a value once you realize that everything is a kind of bet, or endorse faith in god as a value if you no longer believe that god exists

If you have a professed value of "betting is bad", and then you learn that "everything" is betting, one of the three happens:

You decide that if everything is betting, everything is bad. You do nothing. Your values are stable and consistent.
You keep thinking that things like betting on horse races are bad while mundane actions like buying groceries are good. That means you disprefer not "betting" in full generality, but certain kinds of betting, the social activities that our culture considers central examples of betting. So you keep not betting on horses and keep buying groceries. Your values are stable and consistent.
You decide that betting is good. That means "betting is bad" was not a terminal value, but a heuristic [LW · GW] that you thought helped you navigate to achieving your actual terminal values. So you start endorsing bets, including some horse bets. Your values are stable and consistent.

Similar with faith in God. Either you actually terminally value having faith in God so you refuse to change your mind ever, or you valued not "faith in God" but some correlates of that belief (like having a sense of spiritual purpose) so you just start optimizing for them directly with newfound clarity, or you didn't actually value faith in God but valued e. g. having an accurate model of reality and honestly believed that God existed.

Similar for the other examples. Either there are some concrete things corresponding to personal identity/preference satisfaction/etc., or we value not these things but some actually-existing correlates of these things, or acting like we value these things is a heuristic that instrumentally helps us arrive at good outcomes. Either way, ontology shifts don't do anything bad to our values.

We didn't stop valuing people when we figured out that "people" are just collections of atoms like everything else, instead of e. g. immortal souls. We just re-defined what we value in terms of the new ontology. As such, I think that (real) values are actually fully robust to ontology shifts [LW · GW].

Or, the other way around, perhaps "values" are defined by being robust to ontology shifts.

Replies from: NickGabs

↑ comment by NickGabs · 2022-07-23T23:41:45.215Z · LW(p) · GW(p)

Or, the other way around, perhaps "values" are defined by being robust to ontology shifts.

This seems wrong to me. I don't think that reductive physicalism is true (i. e. the hard problem really is hard), but if I did, I would probably change my values significantly. Similarly for religious values; religious people seem to think that God has a unique metaphysical status such that his will determines what is right and wrong, and if no being with such a metaphysical status existed, their values would have to change.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2022-07-24T03:34:10.084Z · LW(p) · GW(p)

Suppose that there's a kid who is really looking forward to eating the cake his mother promised to bake for him this evening. You might say he values this cake that he's sure exists as he's coming back home after school. Except, he shortly learns that there's no cake: his mother was too busy to make it. Do his values change?

Same with God. Religious people value God, okay. But if they found out there's no God, that doesn't mean they'd have to change their values; only their beliefs. They'd still be the kinds of people who'd value an entity like God if that entity existed. If God doesn't exist, and divine morality doesn't either, that'd just mean the world is less aligned with their values than they'd thought — like the kid who has less sweets than he'd hoped for. They'd re-define their policies to protect or multiply whatever objects of value actually do exist in the world, or attempt to create the valuable things that turned out not to exist (e. g., the kid baking a cake on his own).

None of that involves changes to values.

Replies from: Vladimir_Nesov, NickGabs, Shiroe

↑ comment by Vladimir_Nesov · 2022-07-24T09:20:09.842Z · LW(p) · GW(p)

Absent cake on a particular evening is a much more decisive failure than nonexistence of an abstract deity, because the abstract deity can be channeled and interacted with in imagination, especially posthuman imagination. Its relevance to a lot of future thoughts and decisions isn't easily refuted by things like its absence on the table on a particular evening. Faith in strongly inaccessible cardinals doesn't falter because of their apparent nonexistence in the physical world.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2022-07-24T09:29:38.087Z · LW(p) · GW(p)

Which is analogous to baking the cake yourself, sure.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2022-07-24T09:39:56.549Z · LW(p) · GW(p)

Nope, can't bake the cake in the past, that's why it's an actual refutation, a meaningful disanalogy with the deity case, the reason your arguments about ignoring nonexistent deities don't seem convincing, unlike the much more convincing argument about cake.

Replies from: Thane Ruthenis, Shiroe

↑ comment by Thane Ruthenis · 2022-07-24T09:59:36.209Z · LW(p) · GW(p)

I think I'm missing some inferential step.

Faith in strongly inaccessible cardinals doesn't falter because of their apparent nonexistence in the physical world.

The hypothetical we're discussing is premised on a believer's faith being shaken. If it's shaken, but they refuse to admit the deity doesn't exist, and start acting like talking to their imaginary version of that deity is as good as interacting with it, that's... fine? I still don't see how any of that involves value changes.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2022-07-24T10:09:01.368Z · LW(p) · GW(p)

The hypothetical we're discussing is premised on a believer's faith being shaken.

The premise is that deity is revealed to be nonexistent, not that faith in deity is shaken. I'm arguing that it's coherent and indeed the correct all-else-equal outcome for faith to remain unshaken by the discovery that it doesn't physically exist. This does involve admitting that it doesn't physically exist, no confusion/denial with that. (There are also no value changes, but we don't even get to the point of worrying about those.)

Replies from: Shiroe, Thane Ruthenis

↑ comment by Shiroe · 2022-07-24T10:21:30.117Z · LW(p) · GW(p)

If a deity is revealed to be physically nonexistent, that is perfectly in order because deities are supernatural rather than physical. But if it's revealed that a deity is totally nonexistent i.e. that there is no entity which is the referent of its name, then that is equivalent to faith in it being shaken.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2022-07-24T10:57:30.728Z · LW(p) · GW(p)

Usually you just need to find appropriate axioms, because if morally the object exists (which manifests in ability to reason about it and the facts that were initially discovered), it doesn't matter very much if the original formulation of it didn't work for some technical reason, say led to a formal contradiction in a convoluted way that doesn't scream a natural explanation for inevitability of the contradiction. This does seem like a bit of an ontological crisis though, but the point is that threatening total nonexistence is very hard once you have even an informal understanding of what it is you are talking about, at most you get loss of relevance.

E Cheng (2004) Mathematics, morally

We have things that a given mathematician believes to be true, things that are proved to be true, and mediating between them things that have a moral reason to be true. Or, if you like, things that ought to be true, or as some mathematicians say: things that are morally true.

Replies from: Shiroe

↑ comment by Shiroe · 2022-07-24T11:23:13.757Z · LW(p) · GW(p)

but the point is that threatening total nonexistence is very hard once you have even an informal understanding of what it is you are talking about, at most you get loss of relevance

True, it's hard, but it does happen. As evidenced by the many of us who actually have become apostates of our native religions. After becoming convinced that the central thesis of the faith was in error, the jig was up.

EDIT: I think your way of phrasing it is descriptively the most accurate, because it's psychologically quite possible to resist the act of apostasy despite not actually believing in the truth of your own beliefs. However, for many of us, we would consider such a person a non-believer and hence a de facto apostate, even if they didn't think of themselves that way.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2022-07-24T11:43:54.026Z · LW(p) · GW(p)

threatening total nonexistence is very hard

True, it's hard, but it does happen. As evidenced by the many of us who become apostates

You are misplacing the referent if "it". I was talking of abstract total nonexistence, not loss of worship. These are different things.

Loss of worship doesn't witness total nonexistence, it's possible to stop worshipping a thing that exists even physically, and to care less about a thing that exists abstractly. The fact that something is not worshiped is not any sort of argument for its abstract total nonexistence.

The argument whose applicability (not validity) I was contesting was that abstract total nonexistence implies loss of worship. I talked of how abstract total nonexistence of something previously informally motivated is unusual, doesn't normally happen, only quantitative loss of degree of relevance (generic moral worth). And so the argument rarely applies, because its premise rarely triggers, not giving the conclusion of loss of worship.

The loss of worship itself can of course happen for other reasons, I wasn't discussing this point.

Replies from: Shiroe

↑ comment by Shiroe · 2022-07-24T12:00:49.492Z · LW(p) · GW(p)

Yes, I agreed in my edit that "worship"/"loss-of-worship" are possible necessary and sufficient correlates of "non-apostate"/"apostate" depending on your definition. However, one might say that worship is not sufficient; what is also required is belief.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2022-07-24T12:36:37.428Z · LW(p) · GW(p)

However, one might say that worship is not sufficient; what is also required is belief.

Not sufficient for non-apostasy? What is "belief"? Some things abstractly exist, as coherent ideas. They don't exist in the physical world. They matter or not in some way. Where's "belief" in this, existence in the physical world specifically? Surely not, since then what is the relevance of talking about abstract total nonexistence?

What I would call "belief" or even "existence" in a sense that generalizes beyond the physical is moral relevance, things you care about and take into account in decision making. There is another thread [LW(p) · GW(p)] on this point under this very post. In these terms, deities more strongly exist for believers and weakly exist for non-believers, with relevance for non-believers gained from their channeling via imagination of believers.

↑ comment by Thane Ruthenis · 2022-07-24T10:17:59.121Z · LW(p) · GW(p)

Fair enough, I suppose I was using imprecise language. Stated as this, I don't disagree.

↑ comment by Shiroe · 2022-07-24T09:46:42.177Z · LW(p) · GW(p)

It seems like Thane Ruthenis is claiming rather that even if one accepted that their religious view was refuted, that still wouldn't destroy the believer's value system, or to quote directly [LW(p) · GW(p)]: "Either way, ontology shifts don't do anything bad to our values."

Replies from: Vladimir_Nesov, Thane Ruthenis

↑ comment by Vladimir_Nesov · 2022-07-24T10:00:40.536Z · LW(p) · GW(p)

I'm not responding to what happens if a deity loses relevance in one's thinking. I'm arguing that deity's nonexistence in physical reality is by itself not a reason at all for it to lose relevance (or a central place) in one's thinking, that such relevance can coherently persevere on its own, with no support from reality.

↑ comment by Thane Ruthenis · 2022-07-24T10:06:04.285Z · LW(p) · GW(p)

I think we may be using different definitions of "value". There's a "value" like "what this agent is optimizing for right now", and a "value" like "the cognitive structure that we'd call this agent's terminal values if we looked at a comprehensive model of that agent". I'm talking about the second type.

And e. g. the Divine Command model of morality especially has nothing to do with that second type. It's explicitly "I value the things God says to value because He says to value them". Divine Command values are explicitly instrumental, not terminal.

Replies from: Shiroe

↑ comment by Shiroe · 2022-07-24T10:12:51.353Z · LW(p) · GW(p)

Under your latter definition, could an agent be surprised by learning what its values are?

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2022-07-24T10:21:08.760Z · LW(p) · GW(p)

Yes, very much so.

↑ comment by NickGabs · 2022-07-25T22:06:58.943Z · LW(p) · GW(p)

With regard to God specifically, belief in God is somewhat unique because God is supposed to make certain things good in virtue of his existence; the value of the things religious people value is predicated on the existence of God. In contrast, the value of cake to the kid is not predicated on the actual existence of the cake.

↑ comment by Shiroe · 2022-07-24T08:17:24.397Z · LW(p) · GW(p)

They'd re-define their policies to protect or multiply whatever objects of value actually do exist in the world, or attempt to create the valuable things that turned out not to exist (e. g., the kid baking a cake on his own).

Well, yes, assuming there still are other things of value for them, at least potentially. But if a robot had been a hedonistic consequentialist, then learned that itself and all beings in its universe lacked any phenomenal experiences (and that such experiences were impossible in its universe), that robot wouldn't have any further set of objects to value while still being a hedonistic consequentialist.

You seem to be saying: well, when people say "all I really value is X" they don't really mean that literally. But sometimes, for some agents, it is true that all they value is X. In such cases, there is no further set of things to still value. Utilitarians might fit this description of such an agent. This is why NickGabs had mentioned the hard problem of consciousness.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2022-07-24T08:21:45.503Z · LW(p) · GW(p)

Sure. So the universe is worthless for such agents. I don't see how that's a problem with my model...?

Replies from: Shiroe

↑ comment by Shiroe · 2022-07-24T08:40:42.410Z · LW(p) · GW(p)

Okay, yes. One interpretation of the consequentialism I mentioned degrades gracefully like you say, giving "all zeroes" to a world of mere automata. (That's a point in favor of such a consequentialism that it cannot ever become internally inconsistent.) But what you're saying in the general case doesn't fit what people usually mean when they talk about their values. A world without a God is not merely a world with lower utility for a theist, because the concept of "world" is already defined in terms of a God for the theist to begin with. You think otherwise perhaps because you have carved the concept-space up in such a way that lets you take apart different ideas like "world", "God", "value" and evaluate them separately. But this is not the case for the theist. Otherwise, they would not be a theist.

You're right that it ought to work that way, though.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2022-07-24T08:52:48.852Z · LW(p) · GW(p)

... And so when such a theist discovers that their world-model is fundamentally wrong, they will re-carve it in a way such that they can take "God" out of it, and then find things they can value in the world that actually exists, or, failing that, would assign it zero value and completely break down. No?

I mean, what are you suggesting? What do you think a theist who discovers that God doesn't exist would do, that wouldn't fit into one of the three scenarios I'd outlined above [LW(p) · GW(p)]?

Replies from: Shiroe

↑ comment by Shiroe · 2022-07-24T08:59:26.949Z · LW(p) · GW(p)

A theist who has apostatized may still be the same person afterwards, but by no means are they the same agency. One agent has simply died, being replaced with a new one.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2022-07-24T09:05:35.442Z · LW(p) · GW(p)

I'd be genuinely interested if you could elaborate on that, including the agency vs. personality distinction you're making here.

Replies from: Shiroe

↑ comment by Shiroe · 2022-07-24T09:34:16.073Z · LW(p) · GW(p)

Okay. Let me back up a bit. You had said [LW(p) · GW(p)]:

Either there are some concrete things corresponding to personal identity/preference satisfaction/etc., or we value not these things but some actually-existing correlates of these things, or acting like we value these things is a heuristic that instrumentally helps us arrive at good outcomes. Either way, ontology shifts don't do anything bad to our values.

Leading to this claim, which is very alluring to me, even if I disagree with it:

Or, the other way around, perhaps "values" are defined by being robust to ontology shifts.

You gave the example that us learning that our world is physical didn't make us value humans less, even if we no longer believe that they have immortal souls. That's true, we still value humans. But the problem is that the first group of agents (medieval "we") are a different group than the second group (post-industrial "we"). Each group has a different concept of "human". The two concepts roughly map to the same targets in the territory, but the semantic meaning is different in a way that is crucial to the first group of agents. The first group weren't convinced that they were wrong. They simply got replaced by a new group who inherited their title.

That might sound dramatic, but ask a truly committed religious person if they can imagine themselves not believing in God. They will not be able to imagine it. This is because, in fact, they would no longer be the same agency in the sense that is crucial to them. Society would legally consider them the same person, and their apostate future-self might claim the same title as them and even deny that the change really mattered, but the original entity who was asked, the theistic agent, would no longer recognize any heir as legitimate at that point. From its point of view, it has simply died. That is why the theist cannot even imagine becoming an apostate, even if they grant that there's a greater than zero chance of becoming an apostate at all times.

It's true what you say, if you're talking about people i.e. biological human organisms. Such entities will always be doing something in the world up until the very moment they physically expire, including having their whole world view shattered and living in the aftermath of that. However, pre-world-view-shattering them would not recognize post-world-view-shattering them as a legitimate heir. The person might be the same, but it's a different agency in control.

Similar things can be said about populations and governments.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2022-07-24T10:49:36.102Z · LW(p) · GW(p)

Thanks for elaborating!

I think I get what you mean. Human minds' "breakdown protocol" in case the universe turns out to be empty of value isn't to just shut down; the meat keeps functioning, so what happens is the gradual re-assembly of a new mind/agent from the pieces of the old one. Does that pass the ITT?

But I remain unconvinced that this is what happens during a crisis of faith. By themselves, the theist's refusal to admit their future apostate self as an heir, and their belief that they'd die if they were to lose faith, don't mean much to me beyond "the religion memeplex encourages its hosts to develop such beliefs to strengthen its hold on them". Especially if the apostate later denies that these beliefs were true.

And while my model of a devout mind in the middle of a crisis of faith is indeed dramatic and involves some extreme mental states... I'm unconvinced that this involves changes to terminal values. The entire world-model and the entire suite of instrumental values being rewritten is dramatic enough on its own, and "terminal values" slot nicely into the slot of "what's guiding this rewriting"/"what's the main predictor of what the apostate's instrumental values will be".

Replies from: Shiroe

↑ comment by Shiroe · 2022-07-24T11:08:12.107Z · LW(p) · GW(p)

Building off of your other answer here [LW(p) · GW(p)], I think I can imagine at least one situation where your terminal values will get vetoed. Imagine that you discovered, to your horror, that all of your actions up until now have been subconsciously motivated to bring about doomsday. Causing the death of everyone is actually your terminal goal, which you were ignorant of. Furthermore, your subconsciously motivated actions actually have been effective at bringing the world closer and closer to its demise. Your only way to divert this now is to throw yourself immediately out of the window to your death, thereby averting your own terminal goal.

Would you do this?

Dr. Jekyll and Mr. Hyde are the same person, but are they really the same agent?

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2022-07-24T11:32:39.096Z · LW(p) · GW(p)

Suppose I would end up walking out the window. And it would be the wrong action for me to take. I would be foiled by a bunch of bad heuristics and biases I'd internalized over the course of my omnicidal plot. There would be no agent corresponding to me whose values would be satisfied by this.

It would be not unlike, say, manipulating and gaslighting someone until they decide to kill their entire family. This would be against the values the person would claim as their "truer" ones, but in the moment, under the psychological pressure and the influence of some convincing lies, it'd (incorrectly) feel to them like a good idea.

comment by Vladimir_Nesov · 2022-07-23T05:21:36.343Z · LW(p) · GW(p)

I think there is a dire scarcity of values, culture didn't have time to catch up to awareness of anything close to the current situation, and now the situation is going to shift again. So it's not as important which values survive under ontology shifts, as which values grow under stable ontology, since these unknowns might hold more influence than what's currently salient.

comment by Charlie Steiner · 2022-07-23T02:59:13.640Z · LW(p) · GW(p)

Not everything is a type of bet.

You say

And if we understand reward prediction error in terms of updates to our policy, then deliberately invoking happiness would be in tension with acting effectively in the world.

And I think "acting effectively to do what?"

I think there's a completely implicit answer you give in this post: agents will give up everything good so that they can be more effective replicators in a grim Malthusian future.

Which... sure. This is why we should avoid a grim Malthusian future.

Replies from: ricraz

↑ comment by Richard_Ngo (ricraz) · 2022-07-23T04:41:17.631Z · LW(p) · GW(p)

Edited to clarify that this isn't what I'm saying. Added:

If there's simply a tradeoff between them, we might still want to sacrifice accurate beliefs and effective action for happiness. But what I'm gesturing towards is the idea that happiness might not actually be a concept which makes much sense given a complete understanding of minds - as implied by the buddhist view of happiness as an illusion, for example.

Replies from: Charlie Steiner

↑ comment by Charlie Steiner · 2022-07-23T12:53:15.760Z · LW(p) · GW(p)

Alright.

But it's not like happiness is the Tooth Fairy. It's an honest ingredient in useful models I have of human beings, at some level of abstraction. If you think future decision-makers might decide happiness "isn't real," what I hear is that they're ditching those models of humans that include happiness, and deciding to use models that never mention it or use it.

And I would consider this a fairly straightforward failure to align with my meta-preferences (my preferences about what should count as my preferences). I don't want to be modeled in ways that don't include happiness, and this is a crucial part of talking about my preferences, because there is no such thing as human preferences divorced from any model of the world used to represent them.

I agree that some people are imagining that we'll end up representing human values in some implicitly-determined, or "objective" model of the world. And such a process might end up not thinking happiness is a good model ingredient. To me, this sounds like a solid argument to not do that.

comment by NickGabs · 2022-07-23T23:43:18.380Z · LW(p) · GW(p)

I think this is a good point and one reason to favor more CEV style solutions to alignment, if they are possible, rather than solutions which align make the values of the AI relatively "closer" to our original values.

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2022-07-24T00:25:32.943Z · LW(p) · GW(p)

Eh, CEV got rightly ditched as an actual solution to the alignment problem. The basic problem is it assumed that there was a objective moral reality, and we have little evidence of that. It's very possible morals are subjective, which outright makes CEV non-viable. May that alignment solution never be revived.

Replies from: Vladimir_Nesov, Raemon, Shiroe

↑ comment by Vladimir_Nesov · 2022-07-24T01:02:10.260Z · LW(p) · GW(p)

CEV is a sketch of operationalization of carefully deciding which goals end up being pursued, an alignment target. Its content doesn't depend on philosophical status of such goals or on how CEV gets instantiated, such as whether it gets to be used directly in 21st century by first AGIs or if it comes about later, when we need to get serious about making use of the cosmic endowment.

My preferred implementation of CEV (in the spirit of exploratory engineering) looks like a large collection of mostly isolated simulated human civilizations, where AGIs individually assigned to them perform prediction of CEV in many different value-laden ways (current understanding of values influences which details are predicted with morally relevant accuracy) and use it to guide their civilizations, depending on what is allowed by the rules of setting up a particular civilization. This as a whole gives a picture of path-dependency and tests prediction of CEV within CEV, so that it becomes possible to make more informed decisions on aggregation of results of different initial conditions (seeking coherence), and on choice of initial conditions.

The primary issue with this implementation is potential mindcrime, though it might be possible to selectively modulate the precision used to simulate specific parts of these civilizations to reduce moral weight of simulated undesirable events, or for the civilization-guiding AGIs to intervene where necessary.

The basic problem is it assumed that there was a objective moral reality, and we have little evidence of that. It's very possible morals are subjective, which outright makes CEV non-viable.

Do you mean by "objective moral reality" and morals "being subjective" something that interacts (at all) with the above description of CEV? Are you thinking of a very different meaning of CEV?

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2022-07-24T01:07:00.485Z · LW(p) · GW(p)

I think I might be thinking of a very different kind of CEV.

↑ comment by Raemon · 2022-07-24T06:33:31.946Z · LW(p) · GW(p)

The basic problem is it assumed that there was a objective moral reality, and we have little evidence of that.

AFAICT this is false. CEV runs a check to see if human values turn out to cohere with each other (this says nothing about whether there is an objective morality), and if it finds that they do not, it gracefully shuts down.

My sense from reading the arbital post on it is that Eliezer still considers it the ideal sort of thing to do with an advanced AGI after we gain a really high degree of confidence it it's ability to do very complex things (which admittedly means it's not very helpful for solving our immediate problems). I think some people disagree about it but your statement as-worded seems mostly false to me.

(I recommend folks read the full article: https://arbital.com/p/cev/ )

↑ comment by Shiroe · 2022-07-24T08:53:46.438Z · LW(p) · GW(p)

Only if one interprets "subjective" as meaning "arbitrary". The second meaning of "subjective", which it shares with "phenomena", is quite in accordance with moral realism and CEV-like aspirations.

comment by martinkunev · 2023-07-15T01:59:26.914Z · LW(p) · GW(p)

In "Against Discount Rates [LW · GW]" Eliezer characterizes discount rate as arising from monetary inflation, probabilistic catastrophes etc. I think in this light discount rate less than ONE (zero usually indicates you don't care at all about the future) makes sense.

Some human values are proxies to things which make sense in general intelligent systems - e.g. happiness is a proxy for learning, reproduction etc.

Self-preservation can be seen as an instance of preservation of learned information (which is a reasonable value for any intelligent system). Indeed, If there was a medium superior to a human brain where people could transfer the "contents" of their brain, I believe most would do it. It is not a coincidence that self-preservation generalizes this way. Otherwise elderly people would have been discarded from the tribe in the ancestral environment.

comment by Noosphere89 (sharmake-farah) · 2022-09-11T22:40:17.209Z · LW(p) · GW(p)

This is a reason why the future could be very lovecraftian, and at least partially explains why (assuming moral realism is true) that moral actions seem so disgusting to our intuitions.

comment by MSRayne · 2022-07-23T13:03:32.731Z · LW(p) · GW(p)

Truth is just that which it is useful to believe in order to maximize one's current values. Given that our values rely upon things that may not be "objectively real"... so much the worse for objective reality. I agree with other commenters that values are probably robust to changes in ontology, but let's not forget that we have the ability to simply refuse to change our ontology if doing so decreases our expected value. Rationality is about winning, not about being right.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2022-07-23T19:29:05.883Z · LW(p) · GW(p)

This is kinda-correct for reflectively stable goals, understood as including prior state of knowledge, pursued updatelessly: you form expectations [LW · GW] about what you care about, plan about what you care about, even if it's not the physical reality, even if that gets you destroyed in the physical reality. Probability is degree of caring [LW · GW], and it's possible to care about things other than reality. Still, probably such policies respond to observations of reality with sensible behaviors that appear to indicate awareness of reality, even if in some technical sense that's not what's going on. But not necessarily.

This only works for sufficiently strong consequentialists that can overcome limitations of their cognitive architecture that call for specialization in its parts, so that concluding that it's useful to form motivated beliefs is actually correct and doesn't just break [LW(p) · GW(p)] cognition [LW(p) · GW(p)].

Reflectively stable goals are not what's being discussed in this post. And probably agents with reflectively stable goals are always misaligned [LW · GW].

Replies from: MSRayne

↑ comment by MSRayne · 2022-07-23T21:53:42.558Z · LW(p) · GW(p)

I'm trying very hard to understand the vector valued stuff in your links but I just cannot get it. Even reading about the risk-neutral probability thing - doesn't make any sense. Can you suggest some resources to get me up to speed on the reasoning behind all that?

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2022-07-23T22:27:05.933Z · LW(p) · GW(p)

I've just fixed LaTeX formatting in my post [LW · GW] on Jeffrey-Bolker rotation (didn't notice it completely broke by now when including the link). Its relevance here is as a self-contained mathematically legible illustration to Wei Dai's point [LW · GW] on how probability can be understood as an aspect of an agent's decision algorithm. The point itself is more general, doesn't depend on this illustration.

Specifically, both utility function and prior probability distribution are data determining preference ordering, and mix on equal footing through Jeffrey-Bolker rotation. Informally reframed, neither utility nor probability is more fundamentally objective than the other, and both are "a matter on preference". At the same time, given particular preference, there is no freedom to use probability that disagrees with it, that's determined by "more objective" considerations. This applies when we start with a decision algorithm already given (even if by normative extrapolation), rather than only with a world and a vague idea of how to act in it, where probability would be much more of its own thing.

comment by deepthoughtlife · 2022-07-24T17:33:46.965Z · LW(p) · GW(p)

As far as I can tell, you stopped a little short of really understanding your own examples, and the essay becomes muddled as it goes along, perhaps for that reason. Your initial paragraph was interesting, but then you didn't properly analyze the rest of it.
Someone could say that everything you do is betting, but that is clearly untrue. Betting is in fact, a coherent category that is clearly different from buying groceries. Buying groceries, very low risk, moderate reward. Betting, high risk, high reward. I am against betting, but it is fully coherent for me to be in favor of buying groceries. Yes, investing in startups is betting. No, getting out of bed in the morning is not betting.
Even costless copies are not free if you value people that are that like you. (Stated that way to avoid making a decision on whether they are you.) I don't want my copies to be slave labor, whether temporarily or permanently, and I gain nothing from copies of me making and using their own money (whether or not I value them as people.).
Preference satisfaction should obviously not involve creating wire-headed beings. (It isn't a great measure for other reasons either, but that is a separate argument.)
Happiness is obviously a coherent concept. It is an emotional result from things being good by my values, not because of surprise. When I listen to my favorite singers, I'm not at all surprised by how well they sing, and not by how well the music is made either, but I am very happy (also usually whatever emotions the song conveys.). I can reliably be happy when I want to be (though I spend very little time on being happy for the sake of being happy. I find wire-heading uninteresting even though I would obviously enjoy it.) Happy and surprised is a separate thing from just happy. Giving up happy because of surprise is a completely different thing than giving up happiness itself. Understanding the truth better usually makes me happier too.
Time discount is not totally reliant on certainty of influencing the future. One of my values could very well be caring about the near future more than the distant future because of causal proximity, not causal certainty. I'm sure that I have a small but noticeable preference for now than then even ignoring that my effects are more certain now; far enough in the future, and very large means nothing to me. It would need to matter fundamentally to me rather than simple multiplication of some future value.
I value things that are close to me. I care a great deal about the people I love existing, but only a little bit about people extremely like them existing in the far future. However, if it was actually them, then I would truly care. (Some things have a large discount for time, other, seemingly similar things have basically none.) The exact details of my values matter, not some generic size of impact and distance.
It's obvious we have fundamental values. As obvious as that I am conscious right now. (It's logically possible I could be fundamentally altered to have different values, but I can comfortably say the result wouldn't be me.)
If the heuristics are wrong, just change them. Heuristics are heuristics of our values, not our values themselves, which is why people change the heuristics after being convinced they are unhelpful. Heuristics are not the fundamental thing, so them being messy isn't really all that important a point here. I don't mind changing my heuristics.
Is happiness somethng I fundamentally value? Maybe so, maybe not (again, anti-wireheading.). I do, however, value the state of the universe being a certain way fundamentally, and in how we respond to it (key point). (For instance, I value true love regardless of whether that leads to greater utility.) The explanations would be very long, tedious, and require a lot of soul searching, but the exist.
Perhaps you are mislead because you assume everything is utilitarian at heart? Virtue ethics would not make these mistakes. Nor would many other philosophies. Doing the math is just one virtue.

Which values are stable under ontology shifts?

Contents

48 comments