# Motivating a Semantics of Logical Counterfactuals

post by Sam_A_Barnett · 2017-09-22T01:10:27.759Z · score: 21 (10 votes) · LW · GW · 3 comments(**Disclaimer: ***This post was written as part of the CFAR/MIRI AI Summer Fellows Program, and as a result is a vocalisation of my own thought process rather than a concrete or novel proposal. However, it is an area of research I shall be pursuing in the coming months, and so ought to lead to the germination of more ideas later down the line. Credit goes to Abram Demski for inspiring this particular post, and to Scott Garrabrant for encouraging the AISFP participants to actually post something.*)

When reading Soares' and Levinstein's recent publication on decision theory, *Death in Damascus*, I was struck by one particular remark:

Unfortunately for us, there is as yet no full theory of counterlogicals [...], and for [functional decision theory (FDT)] to be successful, a more worked out theory is necessary, unless some other specification is forthcoming.

For some background: an FDT agent must consider counterfactuals about what *would* happen if its decision algorithm on a given input were to output a particular action. If the algorithm is deterministic (or the action under consideration is merely outside of the range of possible outputs of the agent), then it is a logical impossibility that the algorithm produces a different output. Hence the initial relevance of logical counterfactuals or counterlogicals: counterfactuals whose antecedents represent a *logical *impossibility.

My immediate thought about how to tackle the problem of finding a full theory of counterlogicals was to find the right logical *system *that captures counterlogical inference in a way adequate to our demands. For example, Soares and Fallenstein show in *Toward Idealized Decision Theory *that the principle of explosion (from a contradiction, anything can be proved) leads to problematic results for an FDT solution. Why not simply posit that our decision theory uses paraconsistent logic in order to block some inferences?

Instead, and to my surprise, Soares and Levinstein appear to be more concerned about finding a *semantics* for logical counterfactuals - loosely speaking, they are looking for a uniform interpretation of such statements which shows us what they really *mean*. This is what I infer from reading the papers it references on theories of counterlogicals such as Bjerring's *Counterpossibles*, which tries to extend Lewis' possible-worlds semantics for counterfactuals to cases of logical counterfactuals.

The approaches Soares and Levinstein make reference to do not suffice for their purposes. However, this is not because they give proof-theoretic considerations a wide berth, as I thought previously; I now believe that there is something that the semantic approach does get *right*.

Suppose that we found some paraconsistent logical system that permitted precisely the counterlogical inferences that agreed with our intuition: would this theory of logical counterfactuals satisfy the demands of a fully-specified functional decision theory? I would argue not. Specifically, this approach seems to give us a merely a posteriori, ad hoc justification for our theory - given that we are working towards an *idealised *decision theory, we ought to demand that the theory that supports it is *fully motivated*. In this case, this cashes out as making sure our counterlogical statements and inferences has a *meaning* - a meaning we can resort to to settle the truth-value of these counterlogicals.

This is not to say that investigating different logical systems is entirely futile: indeed, if the consideration above were true and a paraconsistent logic could fit the bill, then a uniform semantics for the system would serve as the last piece of the puzzle.

In the next year, I would like to investigate the problem of finding a full theory of logical counterfactuals, such that it may become a tool to be applied to a functional decision theory. This will, of course, involve finding the logical system that best captures our own reasoning about logical counterfactuals. Nonetheless, I will now also seek to find an actual motivation for whichever system seems the most promising, and the best way to find this motivation will be through finding an adequate semantics. I would welcome any suggestions in the comments about where to start looking for such a theory, or if any avenues have thus far proved to be more or less promising.

## 3 comments

Comments sorted by top scores.

My guess is that finding a fully satisfactory solution is hopeless, in much the same way as with specifying aligned goals (i.e. no solution is in closed form, without reference to human-derived systems doing decision theory/axiology).

A crucial problem is finding how agent's decisions influence a given situation, but that situation can include things that reason approximately about the agent, and worse, things that reason about different but similar agents. Agent's decision influences not just precise predictions of itself, but also approximate (and sometimes incorrect) guesses about it, and approximate guesses about similar decisions of similar agents. Judging how a decision influences a system that wrongly guesses the decision of a similar but different agent seems "arbitrary" in the same way as human goals are "arbitrary", that is not arbitrary at all, but in practice not possible to express without reference to philosophy of human-derived things.

Another practical solution might be to characterize a class of situations where decision theory is mostly clear, and make sure to keep the world that way until more general decision theory is developed. This direction can benefit from more general decision theories, but they won't be "fully general", just describe more situations or understand the familiar situations better. (See also.)

The reason we want a description of counterfactuals is to allow for a model of the world where we plug in counterfactual actions and get back the expected outcome, allowing us to choose between actions/strategies. Counterfactuals don't have any reality outside of how we think about them.

Thus, the motivation for an improvement to the causal-intervention model of counterfactuals is not that it should correspond to some external reality, but that it should help reach good outcomes. We can still try to make a descriptive model of how humans do logical counterfactual reasoning, but our end goal should be to understand why something like that actually leads to good outcomes.

It's important to note that it's okay to use human reasoning to validate something that is supposedly not just a descriptive model of human reasoning. Sure, it creates selection bias, but what other reasoning are we ging to use? See Neurath's Boat (improving philosophy is like rebuilding a boat piece by piece while adrift at sea), Ironism (awareness and acceptance of the contingency of our beliefs).

In the end, I suspect that what counts as a good model for predicting outcomes of actions will vary strongly depending on the environment. See related rambling by me, particularly part 4. This is more related to Scott Garrabrant's logical inductors in hindsight than it was in any kind of foresight.

(Disclaimer: There's a good chance you've already thought about this.)

In general, if you want to understand a system (construal of meaning) forming a model of the output of that system (truth-conditions and felicity judgements) is very helpful. So if you're interested in understanding how counterfactual statements are interpreted, I think the formal semantics literature is the right place to start (try digging through the references here, for example).

**[deleted]**· 2017-09-22T11:37:27.069Z · score: 3 (3 votes) · LW · GW

# Sudoku-like counterfactuals

In a brute-force with backtracking algorithm, you try several mutually exclusive possibilities. Most of them are counterfactual.

When playing sudoku, you think like there could be a 1 somewhere, and if this leads to a contradiction, you deduce there is no 1 at that place.

Dealing with contradiction is easy : you deduce your assumption was *indeed* a counterfactual.

Most logics do so.

# Metaphorical counterfactuals

This is the daily counterfactual. Usually in the form of a "What would happen if **X**... ?".

It is stated as counterfactual, but understanding it as a counterfactual is a linguistic confusion. The correct phrasing would be "Given a theory that *the world* models, what can we deduce if we assume **X**?".

The first one is about the world, the second one is about the theory, and is totally understood in current logics. Furthermore, if with the second approach, you obtain a contradiction, there are two possibilities:

- There is a problem with your assumption, and you learn it. (The Sudoku-like situation.)

- There is a problem with your theory, because you know that assumption is possible. The problem might be that your theory is too specific, and overfits the world facts.

# Causality related counterfactuals

They lead to paradox not because of a problem with counterfactuals, but because of a bad understanding of causality itself. From my point of view, this is the problem of most decision theories. (I haven't read about much. Mostly some MIRI's papers.)

Groking the halting problem and its associated arithmetical hierarchy might help. Basically, predicting a predictor with equivalent-power has a same diagonal argument impossibility.

It's possible if the predictor doesn't communicate its result to the predicted.

The spectrum of intermediate situations are basically hard-to-solve problems that are solved through fixpoints present in the "the predictor accurately predicts" assumption (probabilities only obfuscate this fact), hence the paradox.

It's similar to computing with a restricted form of time travel: closed timelike curves. Which has already been dealt with in a Scott Aaronson's paper. From its abstract :

Following the work of Deutsch, we treat a CTC as simply a region of spacetime where a “causal consistency” condition is imposed, meaning that Nature has to produce a (probabilistic or quantum) fixed-point of some evolution operator. Our conclusion is then a consequence of the following theorem: given any quantum circuit (not necessarily unitary), a fixed-point of the circuit can be (implicitly) computed in polynomial space.