Computational Morality (Part 3) - Similarity of Proposals

post by David Cooper · 2018-04-24T22:12:29.095Z · LW · GW · 0 comments

In part 1 ( https://www.lesswrong.com/posts/Lug4n6RyG7nJSH2k9/computational-morality ) I set out a proposal for a system of machine ethics to govern the behaviour of AGI (in which you simply imagine that you are all the people and other sentiences involved in any situation and seek to minimise harm to yourself on that basis, though without eliminating any components of harm which are necessary as a means of accessing enjoyment which you calculate outweighs that harm, because that kind of harm is cancelled out by the gains). People assured me in the comments underneath that it was wrong, though their justifications for doing so appear to be based on faulty ideas which they seemed unable to explore, but it's also hard to gauge how well they took in the idea in the first place. One person, for example, suggested that my proposal might be Rule-Utilitarianism, and the following link was provided: https://en.wikipedia.org/wiki/Rule_utilitarianism . Here is the key part:-

"For rule utilitarians, the correctness of a rule is determined by the amount of good it brings about when followed. In contrast, act utilitarians judge an act in terms of the consequences of that act alone (such as stopping at a red light), rather than judging whether it faithfully adhered to the rule of which it was an instance (such as, "always stop at red lights"). Rule utilitarians argue that following rules that tend to lead to the greatest good will have better consequences overall than allowing exceptions to be made in individual instances, even if better consequences can be demonstrated in those instances."

This is clearly not my proposal at all, but it's interesting none the less. The kind of rules being discussed there are really just general guidelines that lead to good decisions being made in most situations without people having to think too deeply about what they're doing, but there can be occasions when it's moral to break such rules and it may even be immoral not to. Trying to build a system of morality based on a multitude of such rules is doomed from the outset because most (if not all) of those rules will be incorrect in some or many circumstances. The way we judge whether such a rule is right in one situation and wrong in another is by applying a higher-level moral rule (or set of rules), and all of morality is contained at that higher level.

Such rules are clearly useful as guidelines, and their imposition as binding laws can also be justified if allowing people to break them whenever they judge them to be unnecessary or harmful would lead to greater harm being done on all the occasions when they fail to make such judgements correctly, but this really relates to a practical implementation of morality for players who aren't sufficiently competent to calculate correct action reliably by themselves in the time available, and as such, it is a diversion away from the the question of how actual Morality works and it should not be mixed in with the proposed solutions. (Given that AGI will often have to make rapid decisions that can't be thought through properly in time even using the best hardware and algorithms, it is likely that AGI will need to follow the same kind of guideline-rules too on occasions, but the optimised formulation of such derivative rules, both for people and machines, is a job for AGI which should be carried out by applying the rules of morality at the higher level which already cover everything. Our first task is to frame the governing rules for AGI and we should not allow ourselves to be diverted away from that until that task is done.)

After that suggestion, I was informed that my approach was Utilitarianism, and this was in turn narrowed that down to Negative-Utilitarianism. That suggestion appears to be much closer to the mark, but it looks to me as if my proposed solution may be compatible with quite a few of the other proposals too, and I think that's because I'm following a different kind of approach which acts at a higher level than the rest. If so, that may mean that I've found a method by which all the other proposals can be judged (and which may also render them all redundant).

I'm being asked to select a pigeon hole for my proposed solution, but the categories in the current array of pigeon holes aren't well founded. What I see when I look at other proposals is that many of them don't fit well where they've been placed because they're hybrids of more than one category (or even of more than one supercategory), and the reason this is happening is that a proposal that's found to be wrong in one way simply sparks off a new version which may avoid that fault by borrowing a feature from a rival proposal of a different type. This reminds me of the way you can start from communism, socialism, liberalism or capitalism and tweak them until they all produce the exact same set of policies, at which point their names and imagined differences in philosophy become a complete irrelevance - they are guided towards the same position by a higher-level rule or set of rules which those political thinkers have failed to define (i.e. morality), and all of them can thereby end up being equally right.

Might the same thing be possible with all these rival morality proposals too? It'sclear that many of them start with an inadequate idea which has to be modified repeatedly to eliminate faults, and it gradually mutates into something more and more compatible with actual morality. People can easily see the initial idea as wrong because they're judging it with some higher level morality (which they are unable to define and don't realise they're using, but which they understand intuitively, at least in part). As the modifications eliminate the worst of the faults, the idea evolves towards greater compatibility with the higher level morality, but it also becomes harder to identify the remaining faults because the differences become smaller over time and the direction required for further progress becomes less clear.

I've been asked to state whether my proposal is deontological or whether it's a kind of consequentialism, but that's not a good question. It can be both, and it may not be possible to know which it is. If you were actually going to live all the lives of the different players in a situation, there would be no duty imposed on you whatever because you would always be the one suffering in unnecessary ways as a result of your own bad decisions. However, if you were only going to live one of those lives while other sentiences play the other roles, you would have a duty not to make them suffer unnecessarily, just as they would have a duty not to do that to you. The obligation comes out of agreement that you're making to be fair towards each other (as do any rights that might be defined by that agreement), whereas there's no such agreement for you to make with yourself if you're the only player. Even in the case where you're the only player though, there's still a "should" involved in that there is a certain thing that you should do in some situations in order to get the best outcome for yourself. So, is my approach deontological or not? The answer is that it can be both, although the duty aspect is derived from an agreement to play fair (and not all the players have necessarily agreed to do that because they don't all care about being moral, but we are entitled to impose the agreement on them regardless and hold them to account over it).

The distinction between consequentialism and utilitarianism also appears to be of trivial importance, the former supposedly being broader, but what are this fairness and justice that are held in such high regard? They're merely artefacts of the application of morality, and to put them ahead of morality in importance is a categorisation error caused by a failure to identify what's primary and what's secondary to it, but most importantly, it needn't lead to any kind of consequentalism (of a kind that isn't utilitarianism) producing different results from utilitarianism in all situations. A lot of what we see with these categorisations are just fixations on putting different slants on things, making no difference to any implementation of morality that results. So, for people who seek to build morality into AGI, it's already clear that a lot of the reading we're being asked to do is an unwarranted diversion away from the business we want to get on with - we don't need to get involved in any arguments about which parts are derived from which where they have no impact on the rules.

Most interestingly, all the proposals that look viable appear to have the same idea at their core; namely that (in the words of John Rawls) moral acts are those that we would all agree to if we were unbiased. That's compatible with computational morality. The ideal observer idea is compatible with computational morality. The real observer idea is compatible with computational morality. Anything that isn't compatible with computational morality simply isn't going to fit with morality. It looks to me as if we have the solution already, and all the arguing is the result of people failing to work through thought experiments properly, or talking at cross purposes because they're understanding them differently. In the comments under Part 1, people told me they could show that my system was wrong, but they failed to do so and were easily silenced. Most of them just awarded negative votes while lacking the courage to speak up at all. What help is that to anyone? As I said there, if my system is wrong, I want to know so that I can abandon or modify it - I don't cling to broken ideas and would be only too happy if someone could show me where this one breaks, because that's how you make progress.

My proposal (that we calculate by imagining that we are all the players involved in a situation) forces us to be unbiased and to determine what's fair, and we'll make damned sure that it's fair because we are seeing ourselves as all the people involved and will do our best to ensure that we don't suffer more than necessary. AGI also needs to be unbiased in this way, so any proposed solution that favours the self over other players who are not the self are not right for AGI and can be rejected without hesitation. We are already simplifying things to a small set of proposals, although in the absence of a league table of them, I don't know whether I've seen them all or not. That's why I rely on feedback from the experts in this field - I'm an outsider, and so are most of the other people who are capable of building AGI. We are all looking for information, and none of us care for any diversions into issues about how many angels can dance on the head of a pin. Where are the leading proposals (stripped of artificial distinctions), and where is the discussion that's systematically trying to unify them into a system that can be used to govern AGI?

In part 4 (click on my name at the top to find the rest of this series), I'll go through a number of proposals and show what happens when you compare them with my system of computational morality to give people an idea of how unification (or eliminations) can be achieved.

0 comments

Comments sorted by top scores.