Formalizing reflective inconsistency
post by Johnicholas · 2009-09-13T04:23:04.076Z · LW · GW · Legacy · 13 commentsContents
13 comments
In the post Outlawing Anthropics, there was a brief and intriguing scrap of reasoning, which used the principle of reflective inconsistency, which so far as I know is unique to this community:
If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, as this maximizes your expectation of pleasant experience over future selves.
This post expands upon and attempts to formalize that reasoning, in hopes of developing a logical framework for reasoning about reflective inconsistency.
In diagramming and analyzing this, I encountered a difficulty. There are probably many ways to resolve it, but in resolving it, I basically changed the argument. You might have reasonably chosen a different resolution. Anyway, I'll explain the difficulty and where I ended up.
The difficulty: The text "...maximizes your expectation of pleasant experience over future selves.". How would you compute expectation of pleasant experience? It ought to depend intensely on the situation. For example, a flat future, with no opportunity to influence my experience or that of my sibs for better or worse, would argue that caring for sibs has exactly the same expectation as not-caring. Alternatively, if a mad Randian was experimenting on me, rewarding selfishness, not-caring for my sibs might well have more pleasant experiences than caring. Also, I don't know how to compute with experiences - Total Utility, Average Utility, Rawlsian Minimum Utility, some sort of multiobjective optimization? Finally, I don't know how to compute with future selves. For example, imagine some sort of bicameral cognitive architecture, where two individuals have exactly the same percepts (and therefore choose exactly the same actions). Should I count that as one future self or two?
To resolve this, I replace EY's reason with an argument from analogy, like so:
If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, for the same reasons that the process of evolution created kin altruism.
Here is the same argument again, "expanded". Remember, the primary reason to expand it is not readability - the expanded version is certainly less readable - it is as a step towards a generally applicable scheme for reasoning using the principle of reflective inconsistency.
At first glance, the mechanism of natural selection seems to explain selfish, but not unselfish behavior. However, the structure of the EEA seems to have offered sufficient opportunities for kin to recognize kin with low-enough uncertainty and assist (with small-enough price to the helper and large-enough benefit to the helped) that unselfish entities do outcompete purely selfish ones. Note that the policy of selfishness is sufficiently simple that it was almost certainly tried many times. We believe that unselfishness is still a winning strategy in the present environment, and will continue to be a winning strategy in the future.
The two policies, caring about sibs or not-caring, do in fact behave differently in the EEA, and so they are incompatible - we cannot behave according to both policies at once. Also, since caring about sibs outcompetes not-caring in the EEA, if a not-caring agent, X, were selecting a proxy (or "future self") to compete in an EEA-tournament to for utilons (or paperclips), X would pick a caring agent as proxy. The policy of not-caring would choose to delegate to an incompatible policy. This is what "reflectively inconsistent" means. Given a particular situation S1, one can always construct another situation S2 where the choices available in S2 correspond to policies to send as proxies into S1. One might understand the new situation as having an extra "self-modification" or "precommitment" choice point at the beginning. If a policy chooses an incompatible policy as its proxy, then that policy is "reflectively inconsistent" on that situation. Therefore, not-caring is reflectively inconsistent on the EEA.
The last step to the conclusion is less interesting than the part about reflective inconsistency. The conclusion is something like: "Other things being equal, prefer caring about sibs to not-caring".
Enough handwaving - to the code! My (crude) formalization is written in Automath, and to check my proof, the command (on GNU/Linux) is something like:
aut reflective_inconsistency_example.aut
13 comments
Comments sorted by top scores.
comment by cousin_it · 2009-09-13T10:00:07.605Z · LW(p) · GW(p)
Talking about utility in general doesn't seem to be enough: you need a specific utility function that values inclusive genetic fitness. Kin altruism happens because we share many genes with our kin, so it likely wouldn't arise in an environment without some equivalent of genes. I don't see where your argument uses genes, so it must've gone wrong somewhere.
That said, your paragraph about "not knowing how to compute" looks very right-headed to me, and I'd certainly like to see more in that direction.
Replies from: Johnicholas↑ comment by Johnicholas · 2009-09-13T13:52:06.541Z · LW(p) · GW(p)
"Talking about utility in general doesn't seem to be enough."
Right. Imagine replacing the problematic bit with something about utility:
"If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, as this maximizes your utility."
"This maximizes your utility" is strictly falsified by the flat-future and mad Randian scenarios, so we must be assuming something about the structure of the present scenario. My guess was that we're assuming that the future will be similar ENOUGH to the EEA that unselfishness, at least between xerox-siblings, will continue to be a more effective strategy than selfishness. This is an assumption about the structure of the "game" of the future (e.g. we will be able to recognize our xerox-siblings when we meet them, assistance will sometimes be possible and not too costly, et cetera.).
"You need a specific utility function that values inclusive genetic fitness."
I think this argument works even if everyone is a paperclip maximizer. Suppose that some local resource (e.g. calories) was useful in maximizing paperclips. If the "game" of the future has structure similar enough to the EEA, then spending those resources to "assist" a sib is a more-winning strategy.
Replies from: tut↑ comment by tut · 2009-09-13T16:23:35.976Z · LW(p) · GW(p)
...if everyone is a paperclip maximizer...
Then everyone has identical preferences, so then there is no difference between altruism and selfishness. Doing whatever makes the most paperclips maximizes every paperclip maximizer's utility.
And evolution only tells you what gives the most genetic fitness, not what makes any person happy.
Replies from: Johnicholas↑ comment by Johnicholas · 2009-09-13T18:08:20.895Z · LW(p) · GW(p)
Everyone may have identical preferences, and the preferences may not be what we would call "altruism", but they also have behavior. Do the paperclip maximizers assist their sibs or not?
In order to conclude "In general, prefer caring about sibs to not-caring", then we need to be working inside a scenario where assisting is a more-winning strategy. I believe the best evidence that we're currently a scenario where assisting is a more-winning strategy is the loose similarity of our present scenario to the EEA.
Replies from: Alicorn↑ comment by Alicorn · 2009-09-13T19:07:40.657Z · LW(p) · GW(p)
Paperclip maximizers will help anyone (including a sibling) who will, if and only if so assisted, go on to more than recoup the paperclip-generation value of the resources expended by the assistance in their future endeavors at paperclip maximization.
ETA: Or anyone whose chances of more than recouping said resource expenditure are good enough, or the forseeable recopuage great enough, that the expected result of assistance is more paperclipful than the expected result of not helping.
Replies from: Johnicholas↑ comment by Johnicholas · 2009-09-13T19:53:03.449Z · LW(p) · GW(p)
Exactly. There is a difference between assistance and nonassistnace, and the only way one can recommend assistance is if the SCENARIO is such that assistance leads to better results, whatever "better" means to you. For paperclip maximizers, that's paperclips.
If assistance were unavailable, of zero use, or of actively negative use, then one would not endorse it over nonassistance. I've been trying to convince people that the injunction to prefer assisting one's sibs over not assisting is scenario-dependent.
Replies from: Alicorn, Nick_Tarleton↑ comment by Alicorn · 2009-09-13T19:56:50.918Z · LW(p) · GW(p)
Perhaps of import is that if paperclip maximizer A is considering whether to help paperclip maximizer B, B will only want A to take paperclip-maximizing actions. Cognitively sophisticated paperclip maximizers want everybody to want to maximize paperclips over all else. There is no obvious way in which any action could be considered helpful-to-B unless that action also maximizes paperclips, except on axes that don't matter to B except inasmuch as they maximize paperclips. A real paperclip maximizer will, with no internal conflict whatever, sacrifice its own existence if that act will maximize paperclips. The two paperclip maximizers have identical goals that are completely external to their own experiences (although they will react to their experiences of paperclips, what they want are real paperclips, not paperclip experiences). Most real agents aren't quite like that.
Replies from: Johnicholas↑ comment by Johnicholas · 2009-09-13T21:31:09.576Z · LW(p) · GW(p)
Perhaps an intuition pump is appropriate at this point, explicating what I mean by the verb "assist".
Alfonse, the paperclip maximizer, decides that the best way to maximize paperclips is to conquer the world. In pursuit of the subgoal of conquering the world, Alfonse transforms itself into an army. After a fierce battle with some non-paperclip-ists, an instance of Alfonse considers whether to bind a different, badly injured Alfonse's wound.
Binding the wound will cost some time and calories, resources which might be used in other ways. However, if the wound is bound, the other Alfonse may have an increased chance of fighting in another battle.
By "assistance" I mean "spending an instance-local resource in order that another instance obtains (or doesn't lose) some resources local to the other instance".
Replies from: Vladimir_Nesov↑ comment by Vladimir_Nesov · 2009-09-13T21:36:16.918Z · LW(p) · GW(p)
The two instances should agree on the solution, whatever it is.
↑ comment by Nick_Tarleton · 2009-09-13T21:08:59.877Z · LW(p) · GW(p)
But EY's statement is about terminal values, not injunctions.
Say that at time t=0, you don't care about any other entities that exist at t=0, including close copy-siblings; and that you do care about all your copy-descendants; and that your implementation is such that if you're copied at t=1, by default, at t=2 each of the copies will only care about itself. However, since you care about both of those copies, their utility functions differ from yours. As a general principle, your goals will be better fulfilled if other agents have them, so you want to modify yourself so that your copy-descendants will care about their copy-siblings.
Replies from: Johnicholas↑ comment by Johnicholas · 2009-09-13T21:51:10.929Z · LW(p) · GW(p)
I disagree with your first claim (The statement is too brief and ambiguous to say what definitively what it "is about"), but I don't want to argue it. Let's leave that kind of interpretationism to the scholastic philosophers, who spend vast amounts of effort figuring out what various famous ancients "really meant".
The principle "Your goals will be better fulfilled if other agents have them" is very interesting, and I'll have to think about it.
comment by Nick_Tarleton · 2009-09-13T17:56:45.427Z · LW(p) · GW(p)
For example, a flat future, with no opportunity to influence my experience or that of my sibs for better or worse, would argue that caring for sibs has exactly the same expectation as not-caring.
In this case, for your future selves to care about each other is no worse than for them not to, so if the future might not be flat it increases expectations.
Alternatively, if a mad Randian was experimenting on me, rewarding selfishness, not-caring for my sibs might well have more pleasant experiences than caring.
This scenario introduces a direct dependence of outcomes on your goal system, not just your actions; this does complicate things and it's common to assume it's not the case.
Also, I don't know how to compute with experiences - Total Utility, Average Utility, Rawlsian Minimum Utility, some sort of multiobjective optimization? Finally, I don't know how to compute with future selves. For example, imagine some sort of bicameral cognitive architecture, where two individuals have exactly the same percepts (and therefore choose exactly the same actions). Should I count that as one future self or two?
I don't know how your (or my) morality answers these questions, but however it answers them is what it would want to bind future selves to use. The real underlying reason that EY's statement is a special case of is "see to it that other agents share your utility function, or something as close to it as possible."
Replies from: Johnicholas↑ comment by Johnicholas · 2009-09-13T18:54:11.259Z · LW(p) · GW(p)
Would you argue that it is always better to assist one's xerox-sibs, than not?
My intention in offering those two "pathological" scenarios was to argue that there is an aspect of scenario-dependence in the general injunction "assist your xerox-sibs".
You've disposed of my two counterexamples with two separate counterarguments. However, you haven't offered an argument for scenario-INDEPENDENCE of the injunction.
Your last sentence contains very interesting guideline. I don't think it's really an analysis of the original statement, but that's a side question. I'll have to think about it some more.