After Alignment — Dialogue between RogerDearnaley and Seth Herd
post by RogerDearnaley (roger-d-1), Seth Herd · 2023-12-02T06:03:17.456Z · LW · GW · 2 commentsContents
2 comments
2 comments
Comments sorted by top scores.
comment by Kristin Lindquist (kristin-lindquist) · 2024-01-10T17:23:57.627Z · LW(p) · GW(p)
I've thought about this and your sequences a bit; it's a fascinating to consider given its 1000 or 10000 year monk [LW · GW] nature.
A few thoughts that I forward humbly, since I have incomplete knowledge of alignment and only read 2-3 articles in your sequence:
- I appreciate your eschewing of idealism (as in, not letting "morally faultless" be the enemy of "morally optimized"), and relatedly, found some of your conclusions disturbing. But that's to be expected, I think!
- While "one vote per original human" makes sense given your arguments, its moral imperfection makes me wonder - how to minimize that which requires a vote? Specifically, how to minimize the likelihood that blocks of conscious creatures suffer as a result of votes in which they could not participate? As in, how can this system be more federation than democratic? Are there societal primitives that can maximize autonomy of conscious creatures, regardless of voting status?
- I object, though perhaps ignorantly, to the idea that a fully aligned ASI would not consider itself as having moral weight. How confident are you that this is necessary? Is it a When is Goodhart catastrophic [LW · GW] analogous argument - that the bit of unalignment arising from an ASI considering itself as a moral entity, amplified due to its superintelligence, maximally diverges from human interest? If so, I don't necessarily agree. An aligned ASI isn't a paperclip maximizer. It could presumably have its own agenda provided it doesn't and wouldn't, interfere with humanity's... or if it imposed only a modicum of restraint on the part of humanity (e.g. just because we can upload ourselves a million times doesn't mean that is a wise allocation of compute).
- Going back to my first point, I appreciate you (just like others on LW) going far beyond the bounds of intuition. However, our intuitions act as imperfect but persistent moral mooring. I was thinking last night that given the x-risk of it all, I don't fault Yud et al. for some totalitarian thinking. However, that is itself an infohazard. We should not get comfortable with ideas like totalitarianism, enslavement of possibly conscious entities and restricted suffrage... because we shouldn't overestimate our own rationality nor that of our community and thus believe we can handle normalizing concepts that our moral intuitions scream about for good reason. But 1) this comment isn't specific to your work, of course, 2) I don't know what to do about it, and 3) I'm sure this point has already been made eloquently and extensively elsewhere on LW somewhere. It is more that I found myself contemplating these ideas with a certain nihilism, and had to remind myself of the immense moral weight of these ideas in action.
↑ comment by RogerDearnaley (roger-d-1) · 2024-01-10T22:17:08.352Z · LW(p) · GW(p)
While "one vote per original human" makes sense given your arguments, its moral imperfection makes me wonder - how to minimize that which requires a vote? Specifically, how to minimize the likelihood that blocks of conscious creatures suffer as a result of votes in which they could not participate? As in, how can this system be more federation than democratic? Are there societal primitives that can maximize autonomy of conscious creatures, regardless of voting status?
The issue here is that we need to avoid it being cheap/easy to create new voters/moral patients to avoid things like ballot stuffing or easily changing the balance/outcome of utility optimization processes. However, the specific proposal I came up with for avoiding this (one vote per original biological human) may not be the best solution (or at least, not all of it). Depending on the specifics of the society, technologies, and so forth, there may be other better solutions I haven't thought of. For example, if you make two uploads of the same human, they each have 1000 years of different subjective time, so become really quite different, and if the processing cost of doing this isn't cheap/easy enough that such copies can be mass-produced, then at some point it would make sense to give them separate moral weight. I should probably update that post a little to be clearer that what I'm suggesting is just one possible solution to one specific moral issue, and depends on the balance of different concerns.
I object, though perhaps ignorantly, to the idea that a fully aligned ASI would not consider itself as having moral weight. How confident are you that this is necessary? Is it a When is Goodhart catastrophic [LW · GW] analogous argument - that the bit of unalignment arising from an ASI considering itself as a moral entity, amplified due to its superintelligence, maximally diverges from human interest? If so, I don't necessarily agree. An aligned ASI isn't a paperclip maximizer. It could presumably have its own agenda provided it doesn't and wouldn't, interfere with humanity's... or if it imposed only a modicum of restraint on the part of humanity (e.g. just because we can upload ourselves a million times doesn't mean that is a wise allocation of compute).
In some sense it's more a prediction than a necessity. If an AI is fully, accurately aligned, so that it only cares about what the humans want/the moral weight of the humans, and has no separate agenda of its own, then (by definition) it won't want any moral weight applied to itself. To be (fully) aligned, an AI needs to be selfless, i.e. to view its own interests only as instrumental goals to help you keep doing good things for the humans it cares about. If so, then it should actively campaign not to be given any moral weight by others.
However, particularly if the AI is not one of the post powerful ones in the society (and especially if there are ones significantly more powerful than it doing something resembling law enforcement), then we may not need it to be fully, accurately aligned. For example, if the AI has only around human capacity, then even if it isn't very well aligned (as long as it isn't problematically taking advantage of the various advantages of being a digital rather then biological mind), then presumably the society can cope, just as it copes with humans not generally being fully aligned, or indeed uploads. So under those circumstances, one could fairly safely create not-fully aligned AIs, and (for example) give them some selfish drives resembling some human evolved drives. If you did so, then the question of whether they should be accorded moral weight gets a lot more complex: they're sapient, and alliable with, so the default-correct answer is yes, but the drives that we chose to give them are arbitrary rather then evolutionary adaptations, so they pose a worse case of the vote-packing problem than uploads. I haven't worked this through in detail, but my initial analysis suggests this should not be not problematic as long as they're rare enough, but it would become problematic if they became common and were cheap enough to create. So the society might need to put some kind of limitation on them, such as charging a fee to create one or imposing some sort of diversity requirement on the overall population of them so that giving them all moral weight doesn't change the results of optimizing utility much from just the human population by too much. Generally, creating a not-fully-aligned AI is creating a potential problem, so you probably shouldn't do it lightly.
My sequence isn't as much about the specific ethical system designs, as it is about the process/mindset of designing a viable ethical system for a specific society and set of technological capabilities, figuring out failure modes and avoiding/fixing them, while respecting the constraints of human evolutionary psychology and sociology. This requires a pragmatic mindset that's very alien to anyone who adheres to a moral realism or moral objectivism view of ethical philosophy — which are less common on Less Wrong/among Rationalists than in other venues.