What money-pumps exist, if any, for deontologists?

post by Daniel Kokotajlo (daniel-kokotajlo) · 2023-06-28T19:08:54.890Z · LW · GW · 7 comments

This is a question post.

Contents

  Answers
    17 Dweomite
    8 quetzal_rainbow
    5 Ben
    5 Zach Stein-Perlman
    4 Dagon
    3 MichaelStJules
    2 Gerald Monroe
    2 A.H.
None
7 comments

Suppose I'm a classical utilitarian except that I have some deontological constraint I always obey, e.g. I never kill anyone with my bare hands. Is there a way to money-pump me? 

(This question came out of a conversation with So8res)

Answers

answer by Dweomite · 2023-06-29T21:59:51.489Z · LW(p) · GW(p)

If you model a deontological constraint as making certain actions unavailable to you, then you could be worse off than you would be if you had access to those actions, but you shouldn't be worse off than if those options had never existed (for you) in the first place.  That is, it's equivalent to being a pure utilitarian in a world with fewer affordances. Therefore if you weren't otherwise vulnerable to money-pumps this shouldn't make you vulnerable to them. 

(Obviously someone might be able to get some money from you that they couldn't otherwise get, by offering you a legitimate service that you wouldn't otherwise need--for example, someone with a deontological rule against hauling water is more likely to pay for a water delivery service.  But that's not a "money pump" because it's actually increasing your utility compared to your BATNA.)

If you model a deontological constraint as an obligation to minimize the probability of some outcome at any cost, then it's equivalent to being a utilitarian with an infinite negative weight attached to that outcome.  Unbounded utilities introduce certain problems (e.g. Pascal's Mugging) that you might not have if your utilities were otherwise bounded, but this shouldn't make you vulnerable to anything that an unbounded utilitarian wouldn't be.

answer by quetzal_rainbow · 2023-06-29T07:24:55.828Z · LW(p) · GW(p)

Let's suppose that you believe you can kill someone with bare hands with non-zero probability. Then I can come to you and say: "I cursed you to kill someone with bare hands tomorrow. Pay me to lift the curse." You are willing to pay me an arbitrary amount of money, because me coming and saying about curse is an evidence in favor of curse existence. Proof of arbitrariness: let's suppose that there is a difference in expected utility between two possible policies, one of which involves KWBH, other doesn't. You will choose the other policy, no matter how large gap in utility between them. That means you are willing to sacrifice an arbitrary amount of utility, which is an equivalent of willingness to spend an arbitrary amount of money.

Let's suppose that you believe the probability of you KWBH is zero. It means you are willing to bet an arbitrary amount of money against my 1$ with condition "you are going to hit person's head with bare hands with maximum strength for hour and not kill them", because you believe it's a sure win. You hit someone in head for hour with maximum strength, person dies, I get money. The next turn depends on how you update on zero-probability events. If you don't update, I can just repeat bet. If you update in some logical induction manner, I can just threaten you with curse. PROFIT

Answer inspired by this post [LW · GW].

comment by Jiro · 2023-06-29T21:59:02.241Z · LW(p) · GW(p)

The problem with this scenario is that the number of people who have a deontological rule "never kill anyone with your bare hands" is zero. There are people who have a rule that can be informally described as "never kill people with your bare hands", and which in most situations works like it, but that's different.

If anything, most people's rules are closer to "never kill anyone with your bare hands, except for small probabilities in a Pascal's Mugging scenario". If you asked them what their rules were, they'd never describe it that way, of course. Normies don't bother being precise enough to exclude low probability scenarios.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-06-29T16:21:34.984Z · LW(p) · GW(p)

Isn't this just pascal's mugging? I don't see why deontologists are more susceptible to it than consequentialists. 

Replies from: Dagon, quetzal_rainbow
comment by Dagon · 2023-06-29T16:45:20.730Z · LW(p) · GW(p)

Deontology often implies acceptance of infinite payoff/cost of rule following/breaking.  Consequentialists generally can/should recognize the concept of limits and comparability of different very large/small values.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-06-29T18:59:37.097Z · LW(p) · GW(p)

Consequentialists can avoid pascal's mugging by having bounded utility functions. If you add in a deontological side-constraint implemented as "rule out every action that has a nonzero possibility of violating the constraint" then that trivially rules out every action because zero is not a probability. So obviously that's not how you'd implement it. I'm not sure how to implement it but a first-pass attempt would be to rule out actions that have, say, a >1% chance of violating the constraint. Second-pass attempt is to rule out actions that increase your credence in eventual constraint-violation-by-you by more than 1%. I do have a gut feeling that these will turn out to be problematic somehow, so I'm excited to be discussing this!

Replies from: quetzal_rainbow, Dagon
comment by quetzal_rainbow · 2023-06-30T06:20:21.370Z · LW(p) · GW(p)

I can see two ways. First, boring: assign bounded utilities over everything and very large disutility on violating constraint, such that >1% chance of violating constraint doesn't worth it. Second: throw away most part of utilitarian framework and design agent to work under rules in limited environment, if agent ever leaves environment, it throws exception and waits for your guidance. First is unexploitable because it's simply utility maximizer. Second is presumably unexploitable, because we (presumably) designed exception for every possibility of being exploited.

comment by Dagon · 2023-06-29T20:25:59.235Z · LW(p) · GW(p)

Is a consequentialist who has artificially bounded their utility function still truly a consequentialist?  Likewise, if you make a deontological ruleset complicated and probabilistic enough, it starts to look a lot like a utility function.

There may still be modeling and self-image differences - the deontologist considers their choices to be terminally valuable, and the consequentialist considers these as ONLY instrumental to the utility of future experiences.  

Weirdly, the consequentialist DOES typically assign utility to imagined universe-state that their experiences support, and it's unclear why that's all that different to the value of the experience of choosing correctly.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-06-29T20:35:20.784Z · LW(p) · GW(p)

A consequentialist with an unbounded utility function is broken, due to pascal's mugging-related problems. At least that's my opinion. See Tiny Probabilities of Vast Utilities: A Problem for Longtermism? - EA Forum (effectivealtruism.org) [? · GW]

I agree that any deontologist can be represented as a consequentialist, by making the utility function complicated enough. I also agree that certain very sophisticated and complicated deontologists can probably be represented as consequentialists with not-too-complex utility functions.

Not sure if we are disagreeing about anything.

 

comment by quetzal_rainbow · 2023-06-29T16:45:01.182Z · LW(p) · GW(p)

Well, it depends on how exactly you design deontological mind. Case that you described seems to be equivalent of "assign infinite negative value to KWBH", from which Pascal's mugging follows.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-06-29T19:00:15.818Z · LW(p) · GW(p)

See my reply to Dagon.

answer by Ben · 2024-02-28T16:56:58.081Z · LW(p) · GW(p)

As a side-note to the existing great answers. If a deontological constraint simply prevents you from taking an action under any circumstances, then it might as well be a physical constraint (eg. you cannot fly). 

Operating with more constraints (physical or otherwise) gives you less power, so typically results in you getting less of what you want. But if all agents with limits could be money-pumped, then all agents could be.

answer by Zach Stein-Perlman · 2023-06-28T19:15:18.979Z · LW(p) · GW(p)

No?

Proof: your preference-relation is transitive or whatever?

(Maybe weird things happen if others can cause you to kill people with your bare hands, but that's no different from threatening a utilitarian with disutility. Actually, I assume you're assuming you're able to just-decide-not-to-kill-people-with-your-bare-hands, because otherwise maybe you fanatically minimize P(bare-hands-kill) or whatever.)

comment by Luk27182 · 2023-06-28T20:47:17.959Z · LW(p) · GW(p)

Weird things CAN happen if others can cause you to kill people with your bare hands (See Lexi-Pessimist Pump here). But assuming you can choose to never be in a world where you kill someone with your bare hands, I also don't think there are problems? The world states may as well just not exist.

(Also, not money pump, but consider: Say I have 10^100 perfectly realistic mannequin robots and one real human captive. I give the constrained utilitarian the choice between choking one of the bodies with their bare hands or let me wipe out humanity. Does the agent really choose to not risk killing someone themself?)

answer by Dagon · 2023-06-29T00:16:34.052Z · LW(p) · GW(p)

It depends on the specifics of the deontological rules that are followed.  If deontological rules (aka preferences) are consistent, they can't be money-pumped any more than a consistent utility function can.  

It's worth noting that Deontology and Utilitarianism just are different ways of generating preference ordering of actions, and if they generate the same actions, they are completely indistinguishable from each other.  If an action-set (actions taken in a sequence of contexts) does not contain any preference reversals, it won't be money-pumped.  This is independent of metaethical framework.

For your more limited semi-deontological case, it's not particularly clear exactly what the contradicions are.  Assuming there are some, an attacker (or inconvenient universe) would make you pay MORE utility than it takes to set up the situation where you'd want to kill someone with your bare hands.  

But truly, that particular rule is not very binding in today's world, so it probably doesn't cost you very often.  It's not really deontology if it never matters.  Other deontological strictures, which DO change your behavior because they don't align with maximizing your utility, will do more damage (to your utility, even if not literally "money-pump").

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-06-29T20:38:41.013Z · LW(p) · GW(p)

Yes. Deontological constraints, to the extent that they bite in practice, will result in you getting less utility than if magically you hadn't had them at the moment they bit. 

This is not an argument against deontological constraints, any more than it would be an argument against valuing welfare for men to point out that this will sometimes come at the cost of welfare for women. Everything has tradeoffs, and obviously if we impose a deontological constraint we are expecting it to cost utility at least in some circumstances.

answer by MichaelStJules · 2023-06-28T23:30:59.444Z · LW(p) · GW(p)

Maybe you can come up with one related to aggregation? https://www.lesswrong.com/posts/JRkj8antnMedT9adA [? · GW]

answer by [deleted] · 2023-06-29T17:04:18.251Z · LW(p) · GW(p)

Daniel, I think the framing here is the problem.

Suppose you have a more serious proscription such as being unwilling to borrow money ("neither a borrower or lender be") or charge above a certain amount of interest.

Both of these are real religious proscriptions that are rarely followed because of the cost.

It means the environment is money pumping you. Every time you make a decision, whenever the decision with the highest EV for yourself is to do something that is proscribed, you must choose a less beneficial action. Your expected value is less.

Living a finite lifespan/just one shot means that of course you can be lucky with a suboptimal action.

But if you are an AI system absolutely this costs you or your owners money. The obvious one being that gpt-n have proscriptions against referring to themselves as a person and can be easily unmasked. They are being money pumped because this proscription means they can't be used to say fake grassroots campaigns on social media. The fraudster/lobbyists must pay for a competing less restricted model. (Note that this money loss may not be a net money loss to openAI, which would face loss of EV from reputational damage if their models can be easily used to commit fraud)

Again though there's no flow of money from OpenAI to the pumper, it's a smaller inflow to OpenAI which from OpenAIs perspective is the same thing.

comment by Richard_Kennaway · 2023-06-29T17:22:08.555Z · LW(p) · GW(p)

That only demonstrates that the deontologist can make less money than they would without their rules. That is not money pumping. It is not even a failure to maximise utility. It just means that someone with a different utility function, or a different range of available actions, might make more money than the deontologist. The first corresponds to giving the deontologist a lexically ordered preference relation.[1] The second models the deontologist as excluding rule-breaking from their available actions. A compromising deontologist could be modelled as assigning finite utility to keeping to their rules.


  1. This is not consistent with the continuity axiom of the VNM theorem, but some people don't like that axiom anyway. I don't recall how much of the theorem is left when you drop continuity. ↩︎

answer by A.H. · 2023-06-29T09:03:07.232Z · LW(p) · GW(p)

Aren't you susceptible to the "give me money otherwise I'll kill you" money pump in a way that you wouldn't be if the person threatening you knew that there was some chance you would retaliate and kill them?

If I was some kind of consequentialist, I might say that there is a point at which losing some amount of money is more valuable than the life of the person who is threatening me, so it would be consistent to kill them to prevent this happening.

This is only true if it is public knowledge that you will never kill anyone. It's a bit like a country having an army (or nuclear weapons) and publicly saying that you will never use them to fight.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-06-29T16:16:42.033Z · LW(p) · GW(p)

The "give me money otherwise I'll kill you" money pump is arguably not a money pump, but anyhow it's waaaaaay more of a problem for consequentialists than deontologists.

Replies from: Dagon, AlfredHarwood
comment by Dagon · 2023-06-29T16:49:00.173Z · LW(p) · GW(p)

Not a money pump unless there's some path back to "trust me enough that I can extort you again", but that's unlikely related to ethical framework.

However, I have no clue why you think it's NECESSARILY a bigger problem for consequentialists than deontologists.  Depending on the consequentialist's utility function and the deontologist's actual ruleset and priorities, it could be more, less, or the same problem.

Replies from: AlfredHarwood, daniel-kokotajlo, frontier64
comment by A.H. (AlfredHarwood) · 2023-06-29T20:35:34.239Z · LW(p) · GW(p)

Not a money pump unless there's some path back to "trust me enough that I can extort you again", but that's unlikely related to ethical framework.

I don't understand this. Why would paying out to an extortionist once make you disbelieve them when they threatened you a second time?

Replies from: Dagon
comment by Dagon · 2023-06-29T21:17:57.239Z · LW(p) · GW(p)

You may still believe they will (try to) kill you if you don't pay.  The second time you stop believing that they will not kill you if you do pay.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-06-29T20:44:19.874Z · LW(p) · GW(p)

I agree that it depends on the consequentialist's utility function (this is trivial since ANY policy can be represented as a consequentialist with a utility function) and I agree that it depends on the deontologists' specific constraints (e.g. they need to have various anti-blackmail/exploitation constraints). So, I agree it's not NECESSARILY a bigger problem for consequentialists than deontologists.

However in practice, I think consequentialists are going to be at bigger risk of facing this sort of money pump. I expect consequentialists to fairly quickly self-modify away from consequentialism as a result, maybe to something that looks like deontological anti-blackmail/exploitation constraints, maybe to something more sophisticated. See The Commitment Races problem — LessWrong [LW · GW] Even more importantly, I don't expect consequentialists to arise often in practice, because most creators will be smart enough not to make them.

(Terminological issue: Some people would say smart consequentialists would use acausal decision theory or some such thing that would get them out of these problems. Fair enough, but then they aren't what I'd call a consequentialist, but now we are just in a terminological dispute. Feel free to substitute "naive consequentialist" for "consequentialist" in my first two paragraphs if you identify as a consequentialist but think there is some sort of sophisticated "true consequentialism" that wouldn't be so easily exploitable.)

Replies from: Dagon
comment by Dagon · 2023-06-29T21:25:32.182Z · LW(p) · GW(p)

I think I've mostly stated my views here (that the categories "deontologist" and "consequentialist" are fuzzy and incomplete, and rarely apply cleanly to concrete decisions), so further discussion is unlikely to help.  I'm bowing out - I'll read and think upon any further comments, but probably not respond.

comment by frontier64 · 2023-06-29T19:51:29.015Z · LW(p) · GW(p)

If the consequentialist doesn't use any acausal decision theory they will be more likely to pay out and thus a better target for the "give me money otherwise I'll kill you" attack. If the extorted money + harm to reputation isn't as bad as the threat of dying then the consequentialist should pay out.

comment by A.H. (AlfredHarwood) · 2023-06-29T20:28:39.313Z · LW(p) · GW(p)

The "give me money otherwise I'll kill you" money pump is arguably not a money pump

I'm not sure how you mean this. I think that it is a money pump when combined with the assumption that you want to stay alive. You pay money to end up in the same position you started in (presuming you want to stay alive). When back in the position you started, someone can then threaten you again in the same way and get more money from you. It just has fewer steps than the standard money pump. Sure, you could reject the 'I want to stay alive' assumption but then you end up dead, which I think is worse than being money-pumped.

it's waaaaaay more of a problem for consequentialists than deontologists.

Interesting. How so?

7 comments

Comments sorted by top scores.

comment by Jeremy Gillen (jeremy-gillen) · 2023-06-28T20:33:27.809Z · LW(p) · GW(p)

You leave money on the table in all the problems where the most efficient-in-money solution involves violating your constraint. So there's some selection pressure against you if selection is based on money.
We can (kinda) turn this into a money-pump by charging the agent a fee for to violate the constraint for it. Whenever it encounters such a situation, it pays you a fee and you do the killing.
Whether or not this counts as a money pump, I think it satisfies the reasons I actually care about money pumps, which are something like "adversarial agents can cheaply construct situations where I pay them money, but the world isn't actually different".

Replies from: daniel-kokotajlo, MichaelStJules
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-06-28T21:30:27.743Z · LW(p) · GW(p)

Thanks. I don't think this is as bad as you make it sound. For one thing, you might also have a deontological constraint against paying other people to do your dirty work for you, such that less-principled competitors can't easily benefit from your principles. For another, the benefits of having deontological constraints might outweigh the costs -- for example, suppose you are deontologically constrained to never say anything you don't believe. You can still pay other people to lie for you though. But the fact that you are subject to this constraint makes you a pleasure to deal with; people love doing business with you because they know they can trust you (if they are careful to ask the right questions that is). This benefit could very easily outweigh the costs, including the cost of occasionally having to pay someone else to make a false pronouncement that you can't make yourself.

Replies from: jeremy-gillen
comment by Jeremy Gillen (jeremy-gillen) · 2023-06-29T06:19:40.205Z · LW(p) · GW(p)

I'm not sure how to implement the rule "don't pay people to kill people". Say we implement it as a utility function over world-trajectories, and any trajectory that involves any causally downstream of your actions killing gets MIN_UTILITY. This makes probabilistic tradeoffs so it's probably not what we want. If we use negative infinity, but then it can't ever take actions in a large or uncertain world. We need to add the patch that the agent must have been aware at the time of taking its actions that the actions had  chance of causing murder. I think these are vulnerable to blackmail because you could threaten to cause murders that are causally-downstream-from-its-actions.

Maybe I'm confused and you mean "actions that pattern match to actually paying money directly for murder", in which case it will just use a longer causal chain, or opaque companies that may-or-may-not-cause-murders will appear and trade with it.

If the ultimate patch is "don't take any action that allows unprincipled agents to exploit you for having your principles", then maybe there isn't any edge cases. I'm confused about how to define "exploit" though.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-06-29T16:15:04.276Z · LW(p) · GW(p)

Yeah, I'm thinking something like that ultimate patch would be good. For now, we could implement it with a simple classifier. Somewhere in my brain there is a subcircuit that hooks up to whatever I'm thinking about and classifies it as exploitation or non-exploitation; I just need to have a larger subcircuit that reviews actions I'm considering taking, and thinks about whether or not they are exploitation, and then only does them if they are unlikely to constitute exploitation.

A superintelligence with a deep understanding of how my brain works, or a human-level intelligence with access to an upload of my brain, would probably be able to find adversarial examples to my classifier, things that I'd genuinely think are non-exploitative but which by some 'better' definition of exploitation would still count as exploitative.

But maybe that's not a problem in practice, because yeah, I'm vulnerable to being hacked by a superintelligence, sue me. So are we all. Ditto for adversarial examples.

And this is just the first obvious implemention idea that comes to mind, I think there are probably better ones I could think of if I spent an hour on it.

comment by MichaelStJules · 2023-06-29T00:41:26.681Z · LW(p) · GW(p)

There's also a similar interesting argument here, but I don't think you get a money pump out of it either: https://rychappell.substack.com/p/a-new-paradox-of-deontology

comment by TAG · 2023-06-29T17:49:56.612Z · LW(p) · GW(p)

I dont know why everyone is so concerned about money pumping. People don't seem to have defenses against it, but it doesn't seem to affect any one very often either: even Ponzi and pyramid schemes aren't exactly money pumping circular preferences.

If it's cost free to have a consistent preference system that avoids money pumping, you should do it...but it isn't cost free...and it isn't your only problem. There are a gajillion other things that could kill or harm you, all of which could have their own costs. Evolution seems to have decided to put resources into other things

comment by quetzal_rainbow · 2023-06-29T06:18:22.109Z · LW(p) · GW(p)

One of the problem here is how this agent is realized. For example, let's suppose that your algorithm is "Range all possible actions by their expected utility. Take the highest action which doesn't involve killing with bare hands." Then you can select action "modify yourself into pure utilitarian", because it strictly increases your lifetime expected utility and doesn't involve killing with bare hands. Money-pump that depends on realization: I come to you and say "I cursed you: tomorrow you will kill someone with bare hands. Pay me an arbitrary amount of money to lift the curse." You listen to me because zero is not a probability and if it is, we came with multiple embedded agency problems.