# Counterfactuals, trolljectures, and a proposal for logical causality

# Background

While discussing logical counterfactuals at MSFP, I came up with an idea that Scott pointed out was basically a more formal statement of his trolljecture. This was somewhat distressing, especially because the trolljecture seemed to be basically true! After getting Patrick to explain the counterexamples to me, I abandoned the idea for a bit and began searching for other approaches.

The essential idea behind all the trolljecture counterexamples, at least as I understand them, was that any system of logical counterfactuals based on proving would allow unintended backward reasoning to force the counterfactual to take a stance on seemingly unrelated statements. For example, in Sam’s original counterexample, we can reason backwards from the fact that *A*() = 2 that its proof search must have terminated and thereby derive the inconsistency of PA.

Patrick later outlined a revival of the trolljecture that also assumes the consistency of PA, and seemed to do the right thing on Sam’s counterexample. This revision was, however, still vulnerable to a revised counterexample that built *U*() using a statement *X* that had neither a short proof nor a short disproof. The revised trolljecture ended up still taking a stance on the truth of *X* through essentially the same argument as before, unimpeded by the assumption of consistency.

In this post I propose a formulation of logical counterfactuals that I haven’t managed to shoot down yet. I expect the solution to break, but I haven’t been able to figure out how. I’ll edit this post to include a link to a counterexample once one has been found.

Before explaining my proposal, I think it would be helpful to look at how Pearlian causality handles some tricky problems.

# A causal detour

You can imagine a situation that feels awfully similar to the trolljecture counterexamples in causal counterfactuals. Suppose I have an agent *A* choosing whether to open two boxes. Box 1 contains 5 utility as well as explosives which were rigged to detonate upon opening the box, destroying both boxes. Box 2 contains 5 utility and no explosives. Suppose *A* is risk averse, and will open both boxes if it observes the explosives being disarmed, and will otherwise open only box 2. Unbeknownst to *A*, you had previously disarmed the explosives in box 1. Since *A* never sees the explosives being disarmed, *A* only opens box 2. Knowing that the explosives had been disarmed, you can reason that, if *A* had opened box 1 it would have walked away with 10 utility instead of 5.

In this case, the problem is solvable because empirical causality forms a DAG, where causal influence can only extend forward and never backward. The problem in logical counterfactuals is the backward-leaking influence - hypothesizing that *A* opened the box forces you to conclude that it observed you disarming it. This suggests that if we could formulate a proper notion of a DAG for logical causality then maybe we’d be able to do proper logical counterfactuals.

I had been thinking on this problem for a little while when I noticed that I was confused about how circular causality could work in Pearl’s framework. For an example of circular causality, imagine we have two identical robots, each equipped with a light sensor and a laser pointed at the other robot’s sensor. Suppose each robot is programmed to fire its laser when it detects light in its sensor. A naive encoding of the relationship between these two events would be a cycle in the graph, with an edge pointing from laser_{1} to laser_{2} and laser_{2} to laser_{1}. This seems like a perfectly valid causal structure, but it’s not a DAG!

Pearlian causality gets around this by rephrasing the event from “laser_{1} firing” into “laser_{1} firing at time *t*” with an arrow from “laser_{2} firing at time *t* − 1” and to “laser_{2} firing at time *t* + 1”. Cycles are then replaced with criss-crossing chains, laddering off into the future.

# The proposal

In order to break the cycles, we want some notion of logical time from which influence can only propagate forward. Proof length seems, at least to me, to be the most (only?) natural way to pull something like that off. A proof of length *k* can be thought of as a logical observation at time *k*.

When counterfacting on a statement φ when we’ve already observed (proven) ¬φ (or on ¬φ when we’ve already observed φ), you go back to when you first observed ¬φ and replace it with φ. If you haven’t yet observed φ or ¬φ, you may simply assume φ.

From there you derive forward as you otherwise would have, but censoring any contradictions (this forces the consistency of the counterfactual world even while all the prerequisites to its negation are true).

This breaks cycles in essentially the same way causal counterfactuals do - by doing the counterfactual surgery at a particular time step.

# Behavior on previous counterexamples

In general, this proposal seems to do the right thing once it already knows the outcome.

On Sam’s original counterexample, if you’ve already observed that *A*() = 1, and thus the consistency of PA, then you can correctly reason that if counterfactually *A*() = 2 then *U*() = 10.

On the counterexample to Patrick’s reformulation, again if you’ve already observed that *A*() = 1 then you can evaluate the counterfactual where *A*() = 2 without proving that *X* must have a short proof.

If you haven’t yet observed *A*() = 1, then this notion of counterfactuals still makes statements about how *A*() = 2 ends up happening. I think this is much more reasonable - you’re not forced by the framework to do the backward reasoning if and only if you already know what happened.