# Uncertainty can Defuse Logical Explosions

post by Jemist · 2021-07-30T12:36:29.875Z · LW · GW · 7 comments## Contents

Principle of Explosion Agents and Uncertainty None 7 comments

### Principle of Explosion

The principle of explosion goes as follows: if we assume contradictory axioms then we can derive everything. Numbers prefixed with A are axioms/assumptions, those prefixed with D are derived from those.

**A1 ****A2 ****D2 ** from **D3 ** from and

In English: we assume both is true and is not true for some . Then for any , since we know , we have ( or ) is true. But since we know not is true, we know must be true.

This can cause issues in counterfactual reasoning: an agent considering the following counterfactual might fall prey to the following issue in the five and ten problem:

**A1 **I am an agent which will take the greater of $5 and $10**A2 **$10 > $5**A3 **I take $5**D4 **Either ($5 > $10) or **D5 ** from and

This makes it impossible for the agent to reason about worlds in which its actions contradict with its knowledge of itself. This can be thought of a logical collision: the axioms 1 and 2 collide with counterfactual 3. I think this is related to issues of Lob's theorem but not quite the same. In Lobian problems the agent must reason about itself as a proof-searching machine. In this the agent only needs to reason about itself as a rational agent. Obviously humans do not normally make mistakes this egregious.

In mathematics, the Riemann Hypothesis has yet to be solved, that is, proved or disproved. A few different theorems have been proven to be true by an ingenious method: prove them true in the case that the Riemann Hypothesis is true, then prove them true in the case that the Riemann Hypothesis is false! This does the job of proving the theorem in question, but it introduces a wrinkle: one of the proofs might well be explosion-like. This illustrates that there is no good reason for the contradiction to be obvious. In fact if the Riemann Hypothesis is (as some people have suggested) unprovable, then there will be no logical contradiction involved in the derivation of either one!

What does it even mean for a proof to be explosion-like? We can make logical-looking proofs based on incorrect assumptions. Even applying a general theorem to a special case does this. The proof that every number has a prime factorization takes two starting cases, is or is not prime. Applying it to 345 is still valid, even though one branch of the proof takes the (contradictory) assumption that 345 is prime.

### Agents and Uncertainty

An agent able to think probabilistically might use beliefs, rather than axioms. In case two we have 1 and 2 as beliefs. 3 is taken as a counterfactual. I think it helps to think of counterfactuals as distinct from beliefs, they are objects which are used to reason about the world, and do not have probabilities assigned to them. Each counterfactual does, however, correspond to a belief, which has its own probability assigned.

If the agent notices that 1 and 2 are inconsistent with the counterfactual under consideration, it can use the uncertainty of beliefs to make sense of the counterfactual: ignore the proportion of probability mass assigned to worlds where 1 and 2 are both true. This *defuses* the logical collision. As long as it is sensible enough to not assign probability 1 to its beliefs it can reason sensibly again. This might look like the following:

**B1 **I am an agent which will take the greater of $5 and $10 (probability 0.99)**B2 **$10 > $5 (probability 0.9999)**C3** I take $5**D4 ** from

The defusing does two things: first of all it defuses the explosion, we can no longer derive everything. Now we can work successfully from counterfactuals.

It also allows the agent to (outside of the line of reasoning where C3 is a counterfactual) derive that the belief corresponding to C3 is unlikely, by working backwards. The agent now has information over its own future decisions, but in a way which does not cause the logical explosion from earlier.

If we are reasoning about maths and we take the counterfactual "345 is prime", we can construct a line of reasoning going back to our beliefs about maths which creates a contradiction.

It seems reasonable for the agent to then do two things. In the counterfactual sandbox where 345 is prime, it must assign some probability to each step in its reasoning being incorrect, and some probability to each of its axioms being incorrect (say 5% if there are twenty steps including the axioms). Secondarily, it can go back and assign very very small probability to the belief corresponding to the counterfactual "345 is prime" (as in belief-space each step in the reasoning has <<5% chance of being incorrect). If it uses some sort of beam-width search of possible mathematical proofs, then it can avoid allocating resources to the case that 345 is prime in future. This seems more like how humans reason.

When reasoning about its own behaviour, it seems like an agent should be much more uncertain about its own behaviour than its own reasoning capabilities. The trick applied to 345 being prime earlier works with arbitrarily small chances of reasoning being incorrect.

## 7 comments

Comments sorted by top scores.

## comment by NunoSempere (Radamantis) · 2021-07-30T21:18:11.742Z · LW(p) · GW(p)

Should B2 be "$10 > $5 (probability 0.9999)?". If so, you find yourself in the situation where you have 0.99+ for two contradictory hypothesis, and it's not clear to me what the step "ignore the proportion of probability mass assigned to worlds where 1 and 2 are both true" actually looks like.

Replies from: Jemist## ↑ comment by Jemist · 2021-07-30T21:53:06.830Z · LW(p) · GW(p)

Should B2 be "$10 > $5 (probability 0.9999)?"

Yes it should be, thanks for the catch.

We only ignore the proportion of that probability mass *while thinking about the counterfactual world in which $5 is taken*. It's just treated as we would ignore the probability mass previously assigned to anything we now know to be impossible.

I used "ignore" to emphasize that the agent is not updating either of it's beliefs about B1 or B2 based on C1. It's just reasoning in a "sandboxed" counterfactual world where it now assigns ~99% probability to it taking the lower of $5 and $10 and ~1% chance to $5 being larger than $10. From within the C1 universe it looks like a standard (albeit very strong) bayesian update.

When it stops considering C1, it "goes back to" having strong beliefs that both B1 and B2 are true.

Replies from: Radamantis## ↑ comment by NunoSempere (Radamantis) · 2021-07-31T22:02:53.048Z · LW(p) · GW(p)

Can you give the probabilities that the agent assigns to B1 through D4 in the "sandboxed" counterfactual?

Replies from: Jemist## ↑ comment by Jemist · 2021-08-04T18:09:58.431Z · LW(p) · GW(p)

Yeah, so there are four options, . These will have the ratios . By D4 we'd eliminate the first one. The remaining odds ratios are normalized to be something around . I.e. given that the agent takes $5 instead of $10, it is pretty sure that it's taken the smaller one for some reason, gives a tiny probability of it having miscalculated which of $5 and $10 are larger, and a really really small probability that both are true.

In fact were it to reason further it would see that the fourth option is also impossible, we have an XOR type situation on our hands. Then it would end up with odds around .

That last bit was assuming that it doesn't have uncertainty about its own reasoning capability.

Ideally it would also consider that D4 might be incorrect , and still assign some tiny of probability ( for example, the point is it should be pretty small to both the first and fourth options giving . It wouldn't really consider them for the purposes of making predictions, but to avoid logical explosions, we never assign a "true" zero.

Replies from: Radamantis

## ↑ comment by NunoSempere (Radamantis) · 2021-08-05T07:49:24.175Z · LW(p) · GW(p)

Nice!!