How counterfactual are logical counterfactuals?

donald-hobson

How counterfactual are logical counterfactuals?

post by Donald Hobson (donald-hobson) · 2024-12-15T21:16:40.515Z · LW · GW · No comments

This is a question post.

  Answers
    10 JBlack
    1 ektimo
None
No comments

Logical counterfactuals are when you say something like "Suppose , what would that imply?"

They play an important role in logical decision theory.

Suppose you take a false proposition $P$ and then take a logical counterfactual in which $P$ is true. I am imagining this counterfactual as a function $f (logical propositions) \to [0, 1] \subset R$ that sends counterfactually true statements to 1 and false statements to 0.

Suppose $P$ is " not Fermat's last theorem". In the counterfactual where Fermat's last theorem is false, I would still expect 2+2=4. Perhaps not with measure 1, but close. So $f (" 2 + 2 = 4 ") > 0.9$

On the other hand, I would expect trivial rephrasing of Fermat's last theorem to be false, or at least mostly false.

But does this counterfactual produce a specific counter example? Does it think that $14^{3} + 26^{3} = 29^{3}$ ? Or does it do something where the counterfactual insists a counter-example exists, but spreads probability over many possible counter-examples. Or does it act as if there is a non-standard number counterexample?

How would I compute the value of $f$ in general?

Suppose you are a LDT agent trying to work out whether to cooperate or defect in a prisoners dilemma.

What does the defect counterfactual look like? Is it basically the same as reality except you in particular defect. (So exact clones of you defect, and any agent that knows your exact source-code and is running detailed simulations will defect.)

Or is it broader than that, is this a counterfactual world in which all LDT agents defect in prisoners dilemma situations in general. Is this a counterfactual world in which a bunch of homo-erectus defected on each other, and then all went extinct, leaving a world without humans?

All of the thought about logical counterfactuals I have seen so far is on toy problems that divide the world into Exact-simulations-of-you and Totally-different-from-you.

I can't see any clear idea about what to do with the vaguely similar but not identical to you agents.

Answers

answer by JBlack · 2024-12-17T02:34:01.254Z · LW(p) · GW(p)

Truly logical counterfactuals really only make sense in the context of bounded rationality. That is, cases where there is a logically necessary proposition, but the agent cannot determine it within their resource bounds. Essentially all aspects of bounded rationality have no satisfactory treatment as yet.

The prisoners' dilemma question does not appear to require dealing with logical counterfactuals. It is not logically contradictory for two agents to make different choices in the same situation, or even for the same agent to make different decisions given the same situation, though the setup of some scenarios may make it very unlikely or even direct you to ignore such possibilities.

↑ comment by Donald Hobson (donald-hobson) · 2024-12-17T12:49:53.613Z · LW(p) · GW(p)

There is a model of bounded rationality, logical induction.

Can that be used to handle logical counterfactuals?

↑ comment by Donald Hobson (donald-hobson) · 2024-12-17T12:43:43.231Z · LW(p) · GW(p)

If two Logical Decision Theory agents with perfect knowledge of each other's source code play prisoners dilemma, theoretically they should cooperate.

LDT uses logical counterfactuals in the decision making.

If the agents are CDT, then logical counterfactuals are not involved.

Replies from: JBlack

↑ comment by JBlack · 2024-12-18T00:56:14.163Z · LW(p) · GW(p)

If they have source code, then they are not perfectly rational and cannot in general implement LDT. They can at best implement a boundedly rational subset of LDT, which will have flaws.

Assume the contrary: Then each agent can verify that the other implements LDT, since perfect knowledge of the other's source code includes the knowledge that it implements LDT. In particular, each can verify that the other's code implements a consistent system that includes arithmetic, and can run the other on their own source to consequently verify that they themselves implement a consistent system that includes arithmetic. This is not possible for any consistent system.

The only way that consistency can be preserved is that at least one cannot actually verify that the other has a consistent deduction system including arithmetic. So at least one of those agents is not a LDT agent with perfect knowledge of each other's source code.

We can in principle assume perfectly rational agents that implement LDT, but they cannot be described by any algorithm and we should be extremely careful in making suppositions about what they can deduce about each other and themselves.

Replies from: Jiro

↑ comment by Jiro · 2024-12-18T16:20:59.443Z · LW(p) · GW(p)

I get the impression that "has the agent's source code" is some Yudkowskyism which people use without thinking.

Every time someone says that, I always wonder "are you claiming that the agent that reads the source code is able to solve the Halting Problem?"

Replies from: donald-hobson

↑ comment by Donald Hobson (donald-hobson) · 2024-12-19T14:30:18.728Z · LW(p) · GW(p)

The Halting problem is a worst case result. Most agents aren't maximally ambiguous about whether or not they halt. And those that are, well then it depends what the rules are for agents that don't halt.

There are set ups where each agent is using an nonphysically large but finite amount of compute. There was a paper I saw somewhere a while ago where both agents were doing a brute force proof search for the statement "if I cooperate, then they cooperate" and cooperating if they found a proof.

(Ie searching all proofs containing <10^100 symbols)

Replies from: Jiro

↑ comment by Jiro · 2024-12-19T16:28:24.320Z · LW(p) · GW(p)

There are set ups where each agent is using an nonphysically large but finite amount of compute.

In a situation where you are asking a question about an ideal reasoner, having the agents be finite means you are no longer asking it about an ideal reasoner. If you put an ideal reasoner in a Newcomb problem, he may very well think "I'll simulate Omega and act according to what I find". (Or more likely, some more complicated algorithm that indirectly amounts to that.) If the agent can't do this, he may not be able to solve the problem. Of course, real humans can't, but this may just mean that real humans are, because they are finite, unable to solve some problems.

answer by ektimo · 2024-12-16T00:30:24.502Z · LW(p) · GW(p)

This seems like 2 questions:

Can you make up mathematical counterfactuals and propagate the counterfactual to unrelated propositions? (I'd guess no. If you are just breaking a conclusion somewhere you can't propagate it following any rules unless you specify what those rules are, in which case you just made up a different mathematical system.)
Does the identical twin one shot prisoners dilemma only work if you are functionally identical or can you be a little different and is there anything meaningful that can be said about this? (I'm interested in this one also.)

↑ comment by Viliam · 2024-12-17T11:55:55.120Z · LW(p) · GW(p)

Does the identical twin one shot prisoners dilemma only work if you are functionally identical or can you be a little different and is there anything meaningful that can be said about this?

I guess it depends on how much the parts that make you "a little different" are involved in your decision making.

If you can put it in numbers, for example -- I believe that if I choose to cooperate, my twin will choose to cooperate with probability p; and if I choose to defect, my twin will defect with probability q; also I care about the well-being of my twin with a coefficient e, and my twin cares about my well-being with a coefficient f -- then you could take the payout matrix and these numbers, and calculate the correct strategy.

Option one, what if you cooperate. You multiply your payout, which is C-C with probability p, and C-D with probability 1-p; and also your twin's payout, which is C-C with probability p, and D-C with probability 1-p; then you multiply your twin's payout by your empathy e, and add that to your payout, etc. Okay, this is option one, now do the same for options two; and then compare the numbers.

It gets way more complicated when you cannot make a straightforward estimate of the probabilities, because the algorithms are too complicated. Could be even impossible to find a fully general solution (because of the halting problem).

Replies from: donald-hobson

↑ comment by Donald Hobson (donald-hobson) · 2024-12-17T12:48:42.204Z · LW(p) · GW(p)

I believe that if I choose to cooperate, my twin will choose to cooperate with probability p; and if I choose to defect, my twin will defect with probability q;

And here the main difficulty pops up again. There is no causal connection between your choice and their choice. Any correlation is a logical one. So imagine I make a copy of you. But the copying machine isn't perfect. A random 0.001% of neurons are deleted. Also, you know you aren't a copy. How would you calculate that probability p,q? Even in principle.

No comments

Comments sorted by top scores.

How counterfactual are logical counterfactuals?

Contents

Answers

No comments