LessWrong 2.0 Reader
View: New · Old · TopRestrict date range: Today · This week · This month · Last three months · This year · All time
next page (older posts) →
next page (older posts) →
Exploring the levels of sentience and moral obligations towards AI systems is such a nerd snipe and vortex for mental proceeding!
We did one of the largest-scale reductive thinking when we ascribed moral concern to people+property( of any/each of the people). That brought a load of problems associated with this simplistic ignorance and on of those are xRisks of high-tech property/production.
eleven-1 on AI and Non-Existence.This argument, The Valley Argument, occurred to me in the second half of 2024 and to my knowledge, it is an original formulation of the argument that does not exist in the literature. The closest thing that you can find in the literature is Pascal's mugging, or on the topic of AI and suffering something like Roko's basilisk.
I have not found a satisfying answer to the Valley Argument, that either does not involve near-zero odds or a punitive afterlife. There are possible answers that do not involve either, but in my view are not satisfactory. You can discuss the argument with OpenAI o1 or DeepSeek R1 models, and maybe they come up with something that you will find satisfactory - but I have not seen a satisfactory answer that solves the argument.
In my view, given the situation where we are right now - and the best knowledge that we have now (not 5000 years from now) the most promising path to avoid argument's conclusion is to assume a negative afterlife or to come up with some formula of multiple answers (moral obligation + something else + something else). I have not seen the answer that works yet, but given our situation for most people, negative afterlife avenue would be the most convincing answer to the argument.
flandry39 on What if Alignment is Not Enough?> Humans do things in a monolithic way,
> not as "assemblies of discrete parts".
Organic human brains have multiple aspects.
Have you ever had more than one opinion?
Have you ever been severely depressed?
> If you are asking "can a powerful ASI prevent
> /all/ relevant classes of harm (to the organic)
> caused by its inherently artificial existence?",
> then I agree that the answer is probably "no".
> But then almost nothing can perfectly do that,
> so therefore your question becomes
> seemingly trivial and uninteresting.
The level of x-risk harm and consequence
potentially caused by even one single mistake
of your angelic super-powerful enabled ASI
is far from "trivial" and "uninteresting".
Even one single bad relevant mistake
can be an x-risk when ultimate powers
and ultimate consequences are involved.
Either your ASI is actually powerful,
or it is not; either way, be consistent.
Unfortunately the 'Argument by angel'
only confuses the matter insofar as
we do not know what angels are made of.
"Angels" are presumably not machines,
but they are hardly animals either.
But arguing that this "doesn't matter"
is a bit like arguing that 'type theory'
is not important to computer science.
The substrate aspect is actually important.
You cannot simply just disregard and ignore
that there is, implied somewhere, an interface
between the organic ecosystem of humans, etc,
and that of the artificial machine systems
needed to support the existence of the ASI.
The implications of that are far from trivial.
That is what is explored by the SNC argument.
> It might well be likely
> that the amount of harm ASI prevents
> (across multiple relevant sources)
> is going to be higher/greater than
> the amount of harm ASI will not prevent
> (due to control/predicative limitations).
It might seem so, by mistake or perhaps by
accidental (or intentional) self deception,
but this can only be a short term delusion.
This has nothing to do with "ASI alignment".
Organic live is very very complex
and in the total hyperspace of possibility,
is only robust across a very narrow range.
Your cancer vaccine is within that range;
as it is made of the same kind of stuff
as that which it is trying to cure.
In the space of the kinds of elementals
and energies inherent in ASI powers
and of the necessary (side) effects
and consequences of its mere existence,
(as based on an inorganic substrate)
we end up involuntarily exploring
far far beyond the adaptive range
of all manner of organic process.
It is not just "maybe it will go bad",
but more like it is very very likely
that it will go much worse than you
can (could ever) even imagine is possible.
Without a lot of very specific training,
human brains/minds are not at all well equipped
to deal with exponential processes, and powers,
of any kind, and ASI is in that category.
Organic live is very very fragile
to the kinds of effects/outcomes
that any powerful ASI must engender
by its mere existence.
If your vaccine was made of neutronium,
then I would naturally expect some
very serious problems and outcomes.
Because I don’t care about “humanity in general” nearly as much as I care about my society. Yes, sure, the descendants of the Amish and the Taliban will cover the earth. That’s not a future I strive for. I’d be willing to give up large chunks of the planet to an ASI to prevent that.
I don't know how you would prevent that. Absent an AI catastrophe, fertility will recover, in the sense that "we" (rationalists etc) will mostly be replaced with people of low IQ and impulse control, exactly those populations that have the highest fertility now. And "banishing aging and death" would not prevent them from having high fertility and dominating the future. Moloch is relentless. The problem is more serious than you think.
maxime-riche on [deleted]Density of Actual SFC Density
Add legends to the plots
maxime-riche on [deleted].
Maybe only keep the plot for the density of potential SFC, at the start. Or merge both plots (using 2 Y axes then)
maxime-riche on [deleted]Little progress has been made on this question. Most discussions stop after the following arguments:
Move these quotes to an external document Appendix?
maxime-riche on [deleted]Existing discussions
Move this section to an external document with Appendix
cubefox on Fertility Will Never RecoverI'm almost certainly somewhat of an outlier, but I am very excited about having 3+ children. My ideal number is 5 (or maybe more if I become reasonably wealthy). My girlfriend is also on board.
It's quite a different question whether you would really pull through with this or whether either of you would change their preference and stop at a much lower number.
thomascederborg on A problem shared by many different alignment targetsI'm sorry if the list below looks like nitpicking. But I really do think that these distinctions are important.
Bob holds 1 as a value. Not as a belief.
Bob does not hold 2 as a belief or as a value. Bob thinks that someone as powerful as the AI has an obligation to punish someone like Dave. But that is not the same as 2.
Bob does not hold 3 as a belief or as a value. Bob thinks that for someone as powerful as the AI, the specific moral outrage in question renders the AI unethical. But that is not the same as 3.
Bob does hold 4 as a value. But it is worth noting that 4 does not describe anything load-bearing. The thought experiment would still work even if Bob did not think that the act of creating an unethical agent that determines the fate of the world is morally forbidden. The load-bearing part is that Bob really does not want the fate of the world to be determined by an unethical AI (and thus prefers the scenario where this does not happen).
Bob does not hold 5 as a belief or as a value. Bob prefers a scenario without an AI, to a scenario where the fate of the world was determined by an unethical AI. But that is not the same as 5. The description I gave of Bob does not in any way conflict with Bob thinking that most morally forbidden acts can be compensated for by expressing sincere regret at some later point in time. The description of Bob would even be consistent with Bob thinking that almost all morally forbidden acts can be compensated for by writing a big enough check. He just thinks that the specific moral outrage in question, directly means that the AI committing it is unethical. In other words: other actions are simply not taken into consideration, when going from this specific moral outrage, to the classification of the AI as unethical. (He also thinks that a scenario where the fate of the world is determined by an unethical AI is really bad. This opinion is also not taking any other aspects of the scenario into account. Perhaps this is what you were getting at with point 5).
I insist on these distinctions because the moral framework that I was trying to describe, is importantly different from what is described by these points. The general type of moral sentiment that I was trying to describe is actually a very common, and also a very simple, type of moral sentiment. In other words: Bob's morality is (i): far more common, (ii): far simpler, and (iii): far more stable, compared to the morality described by these points. Bob's general type of moral sentiment can be described as: a specific moral outrage renders the person committing it unethical in a direct way. Not in a secondary way (meaning that there is for example no summing of any kind going on. There is no sense in which the moral outrage in question is in any way compared to any other set of actions. There is no sense in which any other action plays any part whatsoever when Bob classifies the AI as unethical).
In yet other words: the link from this specific moral outrage to classification as unethical is direct. The AI doing nice things later is thus simply not related in any way to this classification. Plenty of classifications work like this. Allan will remain a murderer, no matter what he does after committing a murder. John will remain a military veteran, no matter what he does after his military service. Jeff will remain an Olympic gold winner, no matter what he does after winning that medal. Just as for Allan, John, and Jeff, the classification used to determine that the AI is unethical is simply not taking other actions into account.
The classification is also not the result of any real chain of reasoning. There is no sense in which Bob first concludes that the moral outrage in question should be classified as morally forbidden, followed by Bob then deciding to adhere to a rule which states that all morally forbidden things should lead to the unethical classification (and Bob has no such a rule).
This general type of moral sentiment is not universal. But it is quite common. Lots of people can think of at least one specific moral outrage that leads directly to them viewing a person committing it as unethical (at least when committed deliberately by a grownup that is informed, sober, mentally stable, etc). In other words: lots of people would be able to identify at least one specific moral outrage (perhaps out of a very large set of other moral outrages). And say that this specific moral outrage directly implies that the person is unethical. Different people obviously do not agree on which subset of all moral outrages should be treated like this (even people that agree on what should count as a moral outrage can feel differently about this). But the general sentiment where some specific moral outrage simply means that the person committing it is unethical is common.
The main reason that I insist on the distinction is that this type of sentiment would be far more stable under reflection. There are no moving parts. There are no conditionals or calculations. Just a single, viscerally felt, implication. Attached directly to a specific moral outrage. For Bob, the specific moral outrage in question is a failure to adhere to the moral imperative to punish people like Dave.
Strong objections to the fate of the world being determined by someone unethical are not universal. But this is neither complex nor particularly rare. Let's add some details to make Bob's values a bit easier to visualise. Bob has a concept that we can call a Dark Future. It is basically referring to scenarios where Bad People win The Power Struggle and manage to get enough power to choose the path of humanity (powerful anxieties along these lines seem quite common. And for a given individual it would not be at all surprising if something along these lines eventually turn into a deeply rooted, simple, and stable, intrinsic value).
A scenario where the fate of the world is determined by an unethical AI is classified as a Dark Future (again in a direct way). For Bob, the case with no AI does not classify as a Dark Future. And Bob would really like to avoid a Dark Future. People who thinks that it is more important to prevent bad people from winning than to prevent the world from burning might not be very common. But there is nothing complex or incoherent about this position. And the general type of sentiment (that it matters a lot who gets to determine the fate of the word) seems to be very common. Not wanting Them to win can obviously be entirely instrumental. An intrinsic value might also be overpowered by survival instinct when things get real. But there is nothing surprising about something like this eventually solidifying into a deeply held intrinsic value. Bob does sound unusually bitter and inflexible. But there only needs to be one person like Bob in a population of billions.
To summarise: a non punishing AI is directly classified as unethical. Additional details are simply not related in any way to this classification. A trajectory where an unethical AI determines the fate of humanity is classified as a Dark Future (again in a direct way). Bob finds a Dark Future to be worse than the no AI scenario. If someone were to specifically ask him, Bob might say that he would rather see the world to burn than see Them win. But if left alone to think about this, the world burning in the non-AI scenario is simply not the type of thing that is relevant to the choice (when the alternative is a Dark Future).
First I just want to again emphasise that the question is not if extrapolation will change one specific individual named Bob. The question is whether or not extrapolation will change everyone with these types of values. Some people might indeed change due to extrapolation.
My main issue with the point about moral realism is that I don't see why it would change anything (even if we only consider one specific individual, and also assume moral realism). I don't see why discovering that The Objectively Correct Morality disagrees with Bob's values would change anything (I strongly doubt that this sentence means anything. But for the rest of this paragraph I will reason from the assumption that it both does mean something, and that it is true). Unless Bob has very strong meta preferences related to this, the only difference would presumably be to rephrase everything in the terminology of Bob's values. For example: extrapolated Bob would then really not want the fate of the world to be determined by an AI that is in strong conflict with Bob's values (not punishing Dave directly implies a strong value conflict. The fate of the world being determined by someone with a strong value conflict directly implies a Dark Future. And nothing has changed regarding Bob's attitude towards a Dark Future). As long as this is stronger than any meta preferences Bob might have regarding The Objectively Correct Morality, nothing important changes (Bob might end up needing a new word for someone that is in strong conflict with Bob's values. But I don't see why this would change Bob's opinion regarding the relative desirability of a scenario that contains a non-punishing AI, compared to the scenario where there is no AI).
I'm not sure what role coherence arguments would play here.
It is the AI creating these successor AIs that is the problem for Bob (not the successor AIs themselves). The act of creating a successor AI that is unable to punish is morally equivalent to not punishing. It does not change anything. Similarly: the act of creating a lot of human level AIs is in itself determining the fate of the world (even if these successor AIs do not have the ability to determine the fate of the world).
I'm not sure I understand this paragraph. I agree that if the set is not empty, then a clever AI will presumably find an action that is a Pareto Improvement. I am not saying that there exists an action that is a Pareto Improvement, but that this action is difficult to find. I am saying that at least one person will demand X and that at least one person will refuse X. Which means that a clever AI will just use its cleverness to confirm that the set is indeed empty.
I'm not sure that the following is actually responding to something that you are saying (since I don't know if I understand what you mean). But it seems relevant to point out that the Pareto constraint is part of the AIs goal definition. Which in turn means that before determining the members of the set of Pareto Improvements, there is no sense in which there exists a clever AI that is trying to make things work out well. In other words: there does not exist any clever AI, that has the goal of making the set non-empty. No one has, for example, an incentive to tweak the extrapolation definitions to make the set non-empty.
Also: in the proposal in question, extrapolated delegates are presented with a set. Their role is then supposed to be to negotiate about actions in this set. I am saying that they will be presented with an empty set (produced by an AI that has no motivation to bend rules to make this set non-empty). If various coalitions of delegates are able to expand this set with clever tricks, then this would be a very different proposal (or a failure to implement the proposal in question). This alternative proposal would for example lack the protections for individuals, that the Pareto constraint is supposed to provide. Because the delegates of various types of fanatics could then also use clever tricks to expand the set of actions under consideration. The delegates of various factions of fanatics could then find clever ways of adding various ways of punishing heretics into the set of actions that are on the table during negotiations (which brings us back to the horrors implied by PCEV). Successful implementation of Pareto PCEV implies that the delegates are forced to abide by the various rules governing their negotiations (similar to how successful implementation of classical PCEV implies that the delegates have successfully been kept in the dark regarding how votes are actually settled).
This last section is not a direct response to anything that you wrote. In particular, the points below are not meant as arguments against things that you have been advocating for. I just thought that this would be a good place to make a few points, that are related to the general topics that we are discussing in this thread (there is no post dedicated to Pareto PCEV, so this is a reasonable place to elaborate on some points related specifically to PPCEV).
I think that if one only takes into account the opinions of a group that is small enough for a Pareto Improvement to exist, then the outcome would be completely dominated by people that are sort of like Bob, but that are just barely possible to bribe (for the same reason that PCEV is dominated by such people). The bribe would not primarily be about resources, but about what conditions various people should live under. I think that such an outcome would be worse than extinction from the perspective of many people that are not part of the group being taken into consideration (including from the perspective of people like Bob. But also from the perspective of people like Dave). And it would just barely be better than extinction for many in that group.
I similarly think that if one takes the full population, but bend the rules until one gets a non-empty set of things that sort of looks close to Pareto Improvements, then the outcome will also be dominated by people like Bob (for the same reason that PCEV is dominated by people like Bob). Which in turn implies a worse-than-extinction outcome (in expectation, from the perspective of most individuals).
In other words: I think that if one goes looking for coherent proposals that are sort of adjacent to this idea, then one would tend to find proposals that implies very bad outcomes. For the same reasons that proposals along the lines of PCEV implies very bad outcomes. A brief explanation of why I think this: if one tweaks this proposal until it refers to something coherent, then Steve has no meaningful influence regarding the adoption of those preferences that refer to Steve. Because when one is transforming this into something coherent, then Steve cannot retain influence over everything that he cares about strongly enough (as this would result in overlap). And there is nothing in this proposal that gives Steve any special influence regarding the adoption of those preferences that refer to Steve. Thus, in adjacent-but-coherent proposals, Steve will have no reason to expect that the resulting AI will want to help Steve, as opposed to want to hurt Steve.
It might also be useful to zoom out a bit from the specific conflict between what Bob wants and what Dave wants. I think that it would be useful to view the Pareto constraint as many individual constraints. This set of constraints would include many hard constraints. In particular, it would include many trillions of hard individual-to-individual constraints (including constraints coming from a significant percentage of the global population, that have non-negotiable opinions regarding the fates of billions of other individuals). It is an equivalent but more useful way of representing the same thing. (In addition to being quite large, this set would also be very diverse. It would include hard constraints from many different kinds of non-standard minds. With many different kinds of non-standard ways of looking at things. And many different kinds of non-standard ontologies. Including many types of non-standard ontologies that the designers never considered). We can now describe alternative proposals where Steve gets a say regarding those constraints that only refer to Steve. If one is determined to start from Pareto PCEV, then I think that this is a much more promising path to explore (as opposed to exploring different ways of bending the rules until every single hard constraint can be simultaneously satisfied).
I also think that it would be a very bad idea to go looking for an extrapolation dynamic that re-writes Bob's values in a way that makes Bob stop wanting Dave to be punished (or that makes Bob bribable). I think that extrapolating Bob in an honest way, followed by giving Dave a say regarding those constraints that refer to Dave, is a more promising place to start looking for ways of keeping Dave safe from people like Bob. I for example think that this is less likely to result in unforeseen side effects (extrapolation is problematic enough without this type of added complexity. The option of designing different extrapolation dynamics for different groups of people is a bad option. The option of tweaking an extrapolation dynamic that will be used on everyone, with the intent of finding some mapping that will turn Bob into a safe person, is also a bad option).