https://stanford.library.sydney.edu.au/archives/fall2008/entries/causation-counterfactual/
Counterfactual Theories of
Causation
First published Wed Jan 10, 2001; substantive revision
Sun Mar 30, 2008
The basic idea of
counterfactual theories of causation is that the meaning of causal claims can
be explained in terms of counterfactual conditionals of the form “If A had
not occurred, C would not have occurred”. While counterfactual
analyses have been given of type-causal concepts, most counterfactual analyses
have focused on singular causal or token-causal claims of the form “event c caused
event e”. Analyses of token-causation have become popular in the
last thirty years, especially since the development in the 1970's of possible
world semantics for counterfactuals. The best known
counterfactual analysis of causation is David Lewis's (1973b) theory. However,
intense discussion over thirty years has cast doubt on the adequacy of any
simple analysis of singular causation in terms of counterfactuals. Recent years
have seen a proliferation of different refinements of the basic idea to achieve
a closer match with commonsense judgements about causation.
1. Early Counterfactual Theories
The first explicit
definition of causation in terms of counterfactuals was, surprisingly enough,
given by Hume, when he wrote: “We may define a cause to be an object
followed by another, and where all the objects, similar to
the first, are followed by objects similar to the second. Or, in other
words, where, if the first object had not been, the second never had
existed.” (1748, Section VII). It is difficult to understand how Hume could
have confused the first, regularity definition with the second, very different
counterfactual definition.
At any rate, Hume never
explored the alternative counterfactual approach to causation. In this, as in
much else, he was followed by generations of empiricist philosophers. The chief
obstacle in empiricists' minds to explaining causation in terms of
counterfactuals was the obscurity of counterfactuals themselves, owing chiefly
to their reference to unactualised possibilities.
Starting with J. S. Mill (1843), empiricists tried to analyse
counterfactuals ‘metalinguistically’ in terms of
implication relations between statements. The rough idea is that a
counterfactual of the form “If it had been the case that A, it would have been
the case that C” is true if and only if there is an auxiliary
set S of true statements consistent with the antecedent A,
such that the members of S, when conjoined with A,
imply the consequent C. Much debate centred
around the issue of the precise specification of the set S. (See N.
Goodman 1947.) Most empiricists agreed that S would have to
include statements of laws of nature, while some thought that it would have to
include statements of singular causation. While the truth conditions of
counterfactuals remained obscure in these ways, few empiricists thought it
worthwhile to try to explain causation via counterfactuals.
Indeed, the first real
attempts to present rigorous counterfactual analyses of causation came only in
the late 1960's. (See A. Lyon 1967.) Typical of these attempts was J. L.
Mackie's counterfactual analysis in Chapter 2 of his seminal book The
Cement of the Universe (1974). As well as offering a sophisticated
regularity theory of causation ‘in the objects’, Mackie presented a
counterfactual account of the concept of a cause as “what makes the difference
in relation to some background or causal field” (1980, p.xi).
Mackie's account of the concept of causation is rich in insights, especially concerning
its relativity to a field of background conditions. However, his account never
gained as much attention as his regularity theory of causation ‘in the
objects’, no doubt because his view of counterfactuals (in his (1973)), as
condensed arguments that do not have truth values, compounded empiricists' scepticism about counterfactuals.
The true potential of the
counterfactual approach to causation did not become clear until counterfactuals
became better understood through the development of possible world semantics in
the early 1970's.
2. Lewis's 1973 Counterfactual Analysis
The best known and most
thoroughly elaborated counterfactual theory of causation is David Lewis's
theory in his (1973b), which was refined and extended in articles subsequently
collected in his (1986a). In response to doubts about the theory's treatment of
preemption, Lewis subsequently proposed a fairly radical
revision of the theory. (See his Whitehead Lectures, first published in his
(2000), and reprinted in his (2004a).) In this section we shall confine our
attention to the original 1973 theory, deferring the later changes he proposed
for consideration below.
2.1 Counterfactuals and Causal Dependence
Like most contemporary
counterfactual theories, Lewis's theory employs a possible world semantics for
counterfactuals. Such a semantics states truth conditions for counterfactuals
in terms of similarity relations between possible worlds. Lewis famously
espouses a realism about possible worlds, according to which non-actual
possible worlds are real concrete entities on a par with the actual world. (See
Lewis's defence of modal realism in his (1986e).)
However, most contemporary philosophers would seek to deploy the explanatorily
fruitful possible worlds framework while distancing themselves from full-blown
realism about possible worlds themselves. For example, many would propose to
understand possible worlds as maximally consistent sets of propositions; or
even to treat them instrumentally as useful theoretical entities having no
independent reality.
The central notion of a
possible world semantics for counterfactuals is a relation of comparative
similarity between worlds (Lewis 1973a). One world is said to be closer
to actuality than another if the first resembles the actual world more
than the second does. Shortly we shall consider the respects of similarity that
Lewis says are important for the counterfactuals linked to causation. For now we simply note two formal constraints he imposes on this
similarity relation. First, the relation of similarity produces a weak ordering
of worlds so that any two worlds can be ordered with respect to their closeness
to the actual world, with allowance being made for ties in closeness. Secondly,
the actual world is closest to actuality, resembling itself more than any other
world resembles it.
In terms of this
similarity relation, the truth condition for the counterfactual “If A were
(or had been) the case, C would be (or have been) the case”,
is stated as follows:
(1) |
“If A were the case, C would
be the case” is true in the actual world if and only if (i)
there are no possible A-worlds; or (ii) some A-world
where C holds is closer to the actual world than is
any A-world where C does not hold. |
We shall ignore the first
case in which the counterfactual is vacuously true. The fundamental idea of
this analysis is that the counterfactual “If A were the
case, C would be the case” is true just in case it takes less
of a departure from actuality to make the antecedent true along with the
consequent than to make the antecedent true without the consequent.
In terms of counterfactuals,
Lewis defines a notion of causal dependence between events, which plays a
central role in his theory (1973b).
(2) |
Where c and e are two distinct
possible events, e causally depends on c if
and only if, if c were to occur e would
occur; and if c were not to occur e would
not occur. |
This condition states that
whether e occurs or not depends on whether c occurs
or not. Where c and e are actual occurrent
events, this truth condition can be simplified somewhat. For in this case it follows from the second formal condition on the
comparative similarity relation that the counterfactual “If c were
to occur e would occur” is automatically true: this formal
condition implies that a counterfactual with true antecedent and true
consequent is itself true. Consequently, the truth condition for causal
dependence becomes:
(3) |
Where c and e are two
distinct actual events, e causally
depends on c if and only if, if c were
not to occur e would not occur. |
The right
hand side of this condition is, of course, Hume's second definition of
causation. (As we shall see shortly, Lewis's official definition of causation
differs from it, as he defines causation not in terms of causal dependence
directly, but in terms of chains of causal dependence.) Why is it plausible to
think that causation is conceptually linked with counterfactuals in the way
specified by this definition of causal dependence? One reason is that the idea
of a cause is conceptually linked with the idea of something that makes a
difference and this idea in turn is best understood in terms of
counterfactuals. In Lewis's words: “We think of a cause as something that makes
a difference, and the difference it makes must be a difference from what would
have happened without it. Had it been absent, its effects — some of them, at
least, and usually all — would have been absent as well.” (1973b, p.161)
There are three important
things to note about the definition of causal dependence. First, it takes the
primary relata of causal dependence to be events. Lewis's own
theory of events (1986b) construes events as classes of possible spatiotemporal
regions. However, very different conceptions of events are compatible with the
basic definition. Indeed, it even seems possible to formulate it in terms of
facts rather than events. (For instance, see Mellor 1996, 2004.)
Secondly, the definition
requires the causally dependent events to be distinct from
each other. Distinctness means that the events are not identical, neither
overlaps the other, and neither implies the other. This qualification is
important if spurious non-causal dependences are to be ruled out. (For this
point see Kim 1973 and Lewis 1986b.) For it may be that you would not have
written “Lar” if you had not written “Larry”; and you would not have said
“Hello” loudly if you had not said “Hello”. But neither dependence counts as a
causal dependence since the paired events are not distinct from each other in
the required sense.
Thirdly, the
counterfactuals that are employed in the analysis are to be understood
according to what Lewis calls the standard interpretation. There are several
possible ways of interpreting counterfactuals; and some interpretations give
rise to spurious non-causal dependences between events. For example, suppose
that the events c and e are effects of a common
cause d. It is tempting to reason that there must be a causal
dependence between c and e by engaging in the
following piece of counterfactual reasoning: if c had not
occurred, then it would have to have been the case that d did
not occur, in which case e would not have occurred. But Lewis
says these counterfactuals, which he calls backtracking counterfactuals,
are not to be used in the assessment of causal dependence. The right
counterfactuals to be used are non-backtracking counterfactuals that
typically hold the past fixed up until the time at which the counterfactual
antecedent is supposed to obtain.
2.2 The Temporal Asymmetry of Causal Dependence
What constitutes the
direction of the causal relation? Why is this direction typically aligned with
the temporal direction from past to future? In answer to these questions, Lewis
(1979) argues that the direction of causation is the direction of causal
dependence; and it is typically true that events causally depend on earlier
events but not on later events. He emphasises the
contingency of the latter fact because he regards backwards or time-reversed
causation as a conceptual possibility that cannot be ruled out a priori.
Accordingly, he dismisses any analysis of counterfactuals that would deliver the
temporal asymmetry by conceptual fiat.
Lewis's explanation of the
temporal asymmetry of counterfactual dependence is based on a de facto asymmetry
about the actual world. He defines a determinant for an event
as any set of conditions jointly sufficient, given the laws of nature, for the
event's occurrence. (Determinants of an event may be causes or traces of the
event.) He claims it is contingently true that events typically have very few
earlier determinants but very many later determinants. As an illustration, he
cites Popper's (1956) example of a spherical wavefront expanding outwards from
a point source. This is a process where each sample of the wave postdetermines what happens at the point at which the wave
is emitted. He says the reverse process in which a spherical wave contracts
inward with each sample of wave predetermining what happens at the point the
wave is absorbed would obey the laws of nature but seldom happens in actual fact.
Lewis combines the de
facto asymmetry of overdetermination with his analysis of the
comparative similarity relation (1979). According to this analysis, there are
several respects of similarity to be taken into account
in evaluating non-backtracking counterfactuals: similarity with respect to laws
of nature and also similarity with respect to particular matters of fact.
Worlds are more similar to the actual world the fewer
miracles or violations of the actual laws of nature they contain. Again, worlds
are more similar to the actual world the greater the spatio-temporal region of perfect match of particular fact
they have with the actual world. If the actual world is governed by
deterministic laws, these rules will clash in assessing which counterfactual
worlds are more similar to the actual world. For a
world that makes a counterfactual antecedent true must differ from the actual
world either in allowing some violation of the actual laws, or in differing
from the actual world in particular matters of fact. Lewis's analysis allows a
tradeoff between these competing respects of similarity in such cases. It
implies that worlds with an extensive region of perfect match of particular fact can be considered very similar to the actual
world provided that the match in particular facts with the actual world is
achieved at the cost of a small, local miracle, but not at the cost of a big,
diverse miracle. Taken by itself, this account contains no built-in time
asymmetry. That comes only when it is combined with the asymmetry of
overdetermination.
To see how the two parts
combine, consider the famous example of Nixon and the Nuclear Holocaust. An
early objection to Lewis's account of counterfactuals (Fine 1975) was that,
counterintuitively, it makes this counterfactual false:
(4) |
If Nixon had pressed the button, there would have been a nuclear
war. |
The argument is that a
world in which Nixon pressed the button, but some minute violation of the laws
then prevented a nuclear war, is much more like the actual world than one in
which Nixon pressed the button and a nuclear war took place. Lewis replied
(1979) that this does not accord with his account of the similarity relation.
On this account, a button-pressing world that diverges from the actual world by
virtue of a miracle is more like the actual world than a button-pressing world
that converges with the actual world by virtue of a miracle. For in view of the
asymmetry of overdetermination, the divergence miracle that allows Nixon to
press the button need only be a small, local miracle, but the convergence
miracle required to wipe out the traces of Nixon's pressing the button must be
a very big, diverse miracle. Of course, if the asymmetry of overdetermination
went in the opposite temporal direction, the very same standards of similarity
would dictate the opposite verdict.
In general, then, the
symmetric analysis of similarity, combined with the de facto asymmetry
of overdetermination, implies that worlds that accommodate counterfactual
changes by preserving the actual past and allowing for divergence miracles are
more similar to the actual world than worlds that
accommodate such changes by allowing for convergence miracles that preserve the
actual future. This fact in turn implies that, where the asymmetry of
overdetermination obtains, the present counterfactually depends on the past,
but not on the future.
2.3 Transitivity and Preemption
According to Lewis, causal
dependence between actual events is sufficient for causation, but not necessary
(1973b): it is possible to have causation without causal dependence. This can
happen in the following way. Suppose that c causes d in
virtue of the fact that d causally depends on c,
and d causes e in virtue of the fact
that e causally depends on d. Then because
causation is transitive, Lewis insists, c must cause e.
However, because causal dependence is not transitive like causation, the causal
relation between c and e may not be matched
by a causal dependence. (We shall shortly consider an example of this kind.)
To overcome this problem
Lewis extends causal dependence to a transitive relation by taking its
ancestral. He defines a causal chain as a finite sequence of
actual events c, d, e,…
where d causally depends on c, e on d,
and so on throughout the sequence. Then causation is finally defined in these
terms:
(5) |
c is a cause of e if
and only if there exists a causal chain leading from c to e. |
This definition not only
ensures the transitivity of causation, but it also appears to solve an
additional problem to do with preemption that is illustrated by the following
example. Suppose that two crack marksmen conspire to assassinate a hated
dictator, agreeing that one or other will shoot the dictator on a public
occasion. Acting side-by-side, assassins A and B find
a good vantage point, and, when the dictator appears, both take aim. A pulls his trigger and fires a shot that hits its mark,
but B desists from firing when he sees A pull
his trigger. Here assassin A's actions are the actual cause of the
dictator's death, while B's actions are a preempted potential
cause. (Lewis distinguishes such cases of preemption from
cases of symmetrical overdetermination in which two processes
terminate in the effect, with neither process preempting the other. Lewis
believes that these cases are not suitable test cases for a theory of causation
since they do not elicit clear judgements.) The problem raised by this example
of preemption is that both actions are on a par from the point of view of
causal dependence: if neither A nor B acted,
then the dictator would not have died; and if either had acted without the
other, the dictator would have died.
However, given the
definition of causation in terms of causal chains, Lewis is
able to distinguish the preempting actual cause from the preempted
potential cause. There is a causal chain running from A's actions to
the dictator's death, but no such chain running from B's actions to
the dictator's death. Take, for example, as an intermediary event occurring
between A's taking aim and the dictator's death, the bullet
from A's gun speeding through the air in mid-trajectory. The
speeding bullet causally depends on A's action since the bullett would not have been in mid-trajectory without A's
action; and the dictator's death causally depends on the speeding bullett since by the time the bullett
is in mid-trajectory B has refrained from firing so that the
dictator would not have died without the presence of the speeding bullett. (Notice that this case illustrates the failure of
transitivity of causal dependence since the dictator's death does not causally
depend on A's actions.) Hence, we have a causal chain, and so
causation. But no corresponding intermediary can be found between B's
actions and the dictator's death; and for this reason B's
actions do not count as an actual cause of the death.
So far
we have considered how the counterfactual theory of causation works under the
assumption of determinism. But what about causation when determinism fails?
Lewis (1986c) argues that chancy causation is a conceptual possibility that
must be accommodated by a theory of causation. Indeed, contemporary physics
tells us the actual world abounds with probabilistic processes that are causal
in character. To take a familiar example (Lewis 1986c): suppose that you
mischievously hook up a bomb to a radioactive source and geiger
counter in such a way that the bomb explodes when the counter registers a
certain number of clicks. If it happens that the counter registers the required
number of clicks and the bomb explodes, your act caused the explosion, even
though there is no deterministic connection between them.
In order to accommodate chancy causation, Lewis (1986c) defines a
more general notion of causal dependence in terms of chancy counterfactuals.
These counterfactuals are of the form “If A were the case Pr (C) would be x”, where
the counterfactual is an ordinary would-counterfactual, interpreted according
to the semantics above, and the Pr operator
is a probability operator with narrow scope confined to the consequent of the
counterfactual. Lewis interprets the probabilities involved as temporally
indexed single-case chances. (See his (1980) for the theory of single-case
chance.)
The more general notion of
causal dependence reads:
(6) |
Where c and e are distinct
actual events, e causally depends on c if
and only if, if c were not occurred, the chance of e's
occurring would be much less than its actual chance. |
This definition covers
cases of deterministic causation in which the chance of the effect with the
cause is 1 and the chance of the effect without the cause is 0. But it also
allows for cases of irreducible probabilistic causation where these chances can
take non-extreme values. It is similar to the central
notion of probabilistic relevance used in probabilistic theories of
type-causation, except that it employs chancy counterfactuals rather than
conditional probabilities. (See the discussion in Lewis 1986c for the
advantages of the counterfactual approach over the probabilistic one. Also see
the entry “Probabilistic Causation”.)
The rest of the theory of
chancy causation follows the outlines of the theory of deterministic causation.
Causal dependence is extended to a transitive notion by taking its ancestral.
As before, we have causation when we have one or more steps of causal
dependence.
Before turning to survey
some of the problems confronting Lewis's theory of causation, it is worthwhile
pausing to consider some of the advantages it affords.
At the time that Lewis
advanced his original theory, regularity theories of causation were the
orthodoxy. Taking Hume's first definition as their point of departure, these
theories defined causation in terms of subsumption under lawful regularities. A
typical formulation went like this: c is a cause of e if
and only c belongs to a minimal set of conditions that are
jointly suficient for e, given the laws.
It was well known that theories of this kind were faced with a
number of recalcitrant counterexamples. Thus, while c might
belong to a minimal set of sufficient conditions for e when c is
a genuine cause of e, this might also be true when c is
an effect of e — an effect which could not have occurred,
given the laws and the actual circumstances, except by being caused by e.
Or it might be true when c and e are joint effects
of a common deterministic cause. Or when c is a preempted
potential cause of e — something that did not cause e, but would have done so if the actual cause had
been absent.
In contrast, Lewis's
counterfactual analysis of causation is not subject to the same
counterexamples, so long the counterfactuals in the definition of causal
dependence and causation are interpreted in a non-backtracking fashion. The
theory implies that even if c belongs to a minimal set of
sufficient conditions for e, e will
not causally depend on c when c occurs
after e as its effect, since earlier events do not typically
causally depend on later events. Nor will e causally
depend on c when c and e are
joint effects of a common cause, since the non-backtracking counterfactual “If c had
not occurred, e would still have
occurred” will be true in view of the fact that it
holds fixed the presence of the common cause. Nor will c count
as a cause of e when c is a preempted
potential cause of e in a typical case of preemption. For, as
we have seen, c will not be connected to e by
a chain of causal dependences.
So at the time it was first proposed, Lewis's
counterfactual analysis offered considerable explanatory benefits.
3. Problems for Lewis's
Counterfactual Theory
In this section we consider
the principal difficulties for Lewis's theory that have emerged in discussion
over the last thirty years.
One relatively overlooked
aspect of the concept of causation is its sensitivity to contextual factors. In
so far as Lewis's theory overlooks this context-sensitivity, it represents a
problem for the theory.
The theory assumes that
causation is an absolute relation whose nature does not vary from one context
to another. (This follows from the way the counterfactuals that define the
central notion of causal dependence are governed by a unique, context-invariant
system of weighted respects of similarity.) According to the theory, any event
but for which an effect would not have occurred is one of the effect's causes.
But this generates some absurd results. For example, suppose a camper lights a
fire, a sudden gust of wind fans the fire, the fire gets out of control and the
forest burns down. It is true that if the camper had not lit the fire, the
forest fire would not have occurred. But it is also true that the forest fire
would not have occurred if any of a vast number of contingencies, including the
camper's birth and his failure to be struck down by a meteor before striking
the match, had not occurred. But commonsense draws a distinction between causes
and background conditions, ranking the camper's lighting of the fire among the
former, and his birth and his failure to be struck down by a meteor, among the
latter.
H. L. A. Hart and A.
Honoré (1965; 2nd ed 1985) argue that the distinction between causes and
conditions is relative to context in at least two different ways. One form of
relativity might be called relativity to the context of occurrence. If a forest
is destroyed by fire, the presence of oxygen would be cited as a mere condition
of the forest's destruction. On the other hand, if a fire breaks out in a
laboratory where oxygen is deliberately excluded, it may be appropriate to cite
the presence of oxygen as a cause of the fire. The second form of relativity
might be called relativity to the context of enquiry. For example, the cause of
a great famine in India may be identified by an Indian farmer as the drought,
but the World Food Authority may identify the Indian government's failure to
build up reserves as the cause, and the drought as a mere condition.
For the most part, Lewis
ignores these subtle context-sensitive distinctions, as he says he is
interested in a broad notion of cause. In his view (1986d), every event has an
objective causal history consisting of a vast structure of events ordered by
causal dependence. The human mind may select parts of the causal history for
attention, perhaps different parts for different purposes of enquiry. However,
Lewis does not specify the ‘principles of invidious selection’ by which some
parts of the causal history are selected for attention, except to mention the
relevance of Grice's maxims of conversation. But Grice's maxims of
conversation, as general principles of rational information exchange, are not
well suited to explaining the causation-specific distinctions we draw. As
several philosophers have pointed out (A. Garfinkel (1981); C. Hitchcock
(1996a, 1996b); P. Lipton (1990); J. Woodward (1984); and B. Van Fraassen (1981)), some of the contextual principles behind
our causal judgements seem to rely on considerations concerning which class of
situations the effect is contrasted with.
Thus, in the example of
the Indian famine, we contrast the actual situation in which a famine occurs
with another situation in which normal conditions prevail and a famine does not
occur. A cause is then thought of as a factor that makes the difference between
these situations; and the background conditions are thought of as those factors
that are common to the two situations. In different contexts of enquiry, the
contrast situation is framed in different terms. A farmer may take the contrast
situation to be the normal situation in which the government does not stockpile
food reserves but there is no famine. In this case it would be reasonable for
the farmer to identify the drought as the factor that makes the difference
between this contrast situation and the actual situation in which there is
famine. On the other hand, an official of the World Food Authority with a
different conception of what normally happens may take the contrast situation
to be one in which governments build up food reserves as a precaution against
droughts. Consequently, it would be reasonable for the official to see the
failure of the government to build up food reserves as the factor that makes
the difference between the contrast situation and the actual situation in which
there is a famine. (For discussion of the relevance of contrastive explanation
to the causes/conditions distinction see Menzies 2004a; 2007.)
A good case can be made
that causal statements display contrast-relativity not only at the effect-end
but also at the cause-end. (See Hitchcock, 1996a, 1996b; Maslen 2004; Schaffer
2005) Recognising this helps to deal with a problem
affecting Lewis's original theory. In evaluating whether an event c caused
an event e, Lewis's theory says we have to
consider what would have happened in those closest worlds in which c did
not occur. For example, in evaluating whether the camper's lighting of the fire
caused the forest, we have to consider what would have
happened in those closest possible worlds in which the camper's action of
lighting the fire did not occur. Are these worlds in which the camper does not
light the fire but does something else instead, or are they worlds in which he
lights the fire in slightly different manner (perhaps with a lighter instead of
matches) or at a slightly different time (three minutes later when the wind
died down)? In order to answer such questions, Lewis
says it is necessary to say how much of a change or a delay it takes for an
event to become an altogether different event, rather than a different version
of the same event. (Lewis sometimes discusses this issue as the question of how
fragile events are: a modally fragile event is one which
cannot occur in a different manner or at a different time from its actual
manner and time of occurrence. See Lewis 1986b.) The problem, as he sees it, is
that there is no unique principled way in which we do this: there is linguistic
indeterminancy about what event nominals refer to. He
writes: “We have not made up our minds: and if we presuppose sometimes one
answer and sometimes another answer, we are entirely within our linguistic
rights. This is itself a big problem for a counterfactual analysis of
causation, quite apart from the problem of preemption.” (2000, p.186)
However, if we recognise that the cause-end of causal statements displays
contrast-relativity as well as the effect-end, we can obviate the need to
provide an account of the identity of events under counterfactual changes. For
example, suppose we are interested in why the forest fire took one path P1 rather
than another path P2. Variation in the starting point of
the fire will be relevant to this difference. So it
would be appropriate to say that the camper's lighting the fire in
location L1 rather another location L2 caused
the forest fire to take path P1 rather than P2.
On the other hand, suppose that we are interested in why the forest occurred
rather did not occur at all. Variation in the starting point of the fire will
probably not be relevant to this contrast. Rather the appropriate causal
statement will be one that says the camper's lighting the fire (in some or
other location) rather his not lighting it (in any location) caused the fire to
occur rather than not to occur. Such causal statements reveal the relevant
contrasts at both the cause- and the effect-ends. Sometimes, such contrasts are
indicated by the use of emphasis as in “The fire's
starting in location L1 caused the fire to
take path P1”. But more often than not the surface form
of causal statements does not disclose the contrasts that are intended and they must be supplied by context. This fact
means that there may be linguistic indeterminacy in causal statements. But it
is not indeterminacy about the reference of event nominals, but rather about
the situations that are intended as contrasts for the cause and the effect.
Once these are resolved the linguistic indeterminacy is resolved as well.
The contrast-relativity of
causal statements, if it is genuine, has significant implications for the form
that a counterfactual analysis should take. Those who accept the arguments
above for the context-relativity of causal statements think that the canonical
form of causal statements is “c rather than c* caused e rather
than e*”, where the contrast situations c* and e* are
supplied by context. This suggests that the definition of causal dependence
should not be formulated in terms of the counterfactual “If c had
not occurred, e would not have
occurred”, but the more specific counterfactual “If c* had
occurred instead of c, then e* would have occurred
instead of e”. This formulation has several advantages over the old
formulation. (See Schaffer 2005.) Its chief advantage from the point of our
discussion is that it obviates the need for the counterfactual theory to
provide an account of the identity of events under hypothetical changes. With
this new formulation, there is no need to work out whether c* and e* are
identical with, or different from, c and e,respectively. It is simply stipulated on the basis of contextual considerations that c* and e* are
intended to act as contrasts to c and e.
There have been several
important critical discussions of Lewis's explanation of the temporal asymmetry
of causation. (See A. Elga 2000;M. Frisch 2005; D.
Hausman 1998, Chap. 6;P. Horwich 1987, Chap. 10; and H. Price 1996, Chap. 6.)
One kind of criticism has
focused on the psychological implausibility of Lewis's explanation. (See
Horwich 1987.) Recall that the explanation appeals, on the one hand, to a
system of weighted respects of similarity between possible worlds that is
delivered by a priori conceptual analysis and, on the other
hand, to an asymmetry of overdetermination that is claimed to be a
contingent a posterioritruth
about the actual world. The two-part explanation is supposed to employ facts
that are sufficiently well known to play a role in the explanation of our
linguistic use of counterfactuals. However, it is psychologically implausible
that the intricate system of weighted respects of similarity involving
comparison of miracles of different sizes could capture the intuitive similarity
relation used in counterfactual reasoning. Why should we have developed such a
baroque notion of similarity? Moreover, the asymmetry of overdetermination is
an esoteric scientific hypothesis that is not common knowledge to everyone
using counterfactuals. So it is very unlikely that
this hypothesis could account for ordinary speakers' mastery of the temporal
asymmetry of counterfactuals. (For Lewis's reply to this criticism see
Postscript E to “Counterfactual Dependence and Time's Arrow” in his (1986a, p.
66).)
Another criticism is that
the asymmetry of overdetermination does not exist in the form required to
support Lewis's explanation of the temporal asymmetry of counterfactuals.
Lewis's idea is that any event e has many postdeterminants
and few predeterminants, where a predeterminant or postdeterminant
of an event is a set of conditions that are jointly sufficient, given the laws
of nature, for the occurrence of the event. But if Lewis is assuming that the
laws involved are like those of classical mechanics, he is mistaken on this
score. For a theory that is time symmetric and deterministic in both the
forward and backward direction will imply that for any local event e and
any time t, there is a unique set of conditions obtaining at t that
are necessary and sufficient, given the laws, for the occurrence of the
event e. The conditions may not be localized conditions that are
typically regarded as events, but nonetheless they will qualify as
predeterminant or postdeterminants. For example,
consider Popper's example of the wave spreading out from a point source. If
there is a process that postdetermines what happens
at the point at which the wave is emitted, there is also a process, perhaps a
very unlocalized process, that predetermines this. Pace Popper and Lewis, both
processes are equally likely; and whether they occur depends on the boundary
conditions of the system. (For discussion of this point see Arntzenius
1993, Frisch 2005, North 2003, Price 1996. Also see the entry “Thermodynamic
Asymmetry in Time”.)
A related criticism
concerns the asymmetry of miracles that is central to Lewis's account of the
temporal asymmetry of causation. The asymmetry of miracles consists in the fact
that a miracle that realises a counterfactual
antecedent about particular facts at time t by having a
possible world diverge from the actual world just before the time t is smaller and less diverse than a miracle
that realises the same counterfactual antecedent and
makes a possible world converge to the actual world after the time t.
Adam Elga (2000) has argued that the asymmetry of
miracles does not hold in many cases.
Elga's argument proceeds by way of an example: Gretta cracks
an egg into a hot frying pan at 8:00am and at 8:05am the egg is cooked.
Consider the process that occurs in the period from 8:00am to 8:05am, run
backwards in time: a cooked egg sits in the frypan; it coalesces into a raw egg
and leaps upward; and a shell closes around it. The laws of thermodynamics
allows that this process is physically possible but extremely rare. These laws
also state that the process is very sensitive in its initial conditions: even
the slightest changes in the molecules making up the state of the cooked egg
would result in the process evolving in such a way that the cooked egg continues
to sit in the pan rather than coalescing into a raw egg and leaping upwards.
But this is, Elga points out, exactly the kind of
change that would make for a “convergence miracle”. Take the state of the
actual world at 8:05am, holding fixed its future after this point; make some
small changes to the molecules making up this state; and then run the laws of
thermodynamics backwards in time, and we will almost certainly arrive at a
state in which the egg sits in the pan growing colder. This state will be one in
which Gretta does not crack the egg. The small change in the state of the
actual world at 8:05am is a “convergence miracle” that yields a possible world
that realises the counterfactual proposition that
Gretta does not crack the egg at 8:00am while holding fixed the actual future
after 8:05am. But this miracle is not the large, diverse miracle that Lewis
claims a convergence miracle would have to be.
As we have seen, Lewis
builds transitivity into causation by defining it in terms of chains of causal
dependence. The transitivity of causation fits with some of our explanatory
practices. For example, historians wishing to explain some significant
historical event will trace the explanation back through a
number of causal links, concluding that the event at the beginning of
the causal chain is responsible for the event being explained. On the other
hand, a number of counter-examples have been presented
which cast doubt on transitivity. (Lewis 2004a presents a short catalogue of
these counterexamples.) Here is a sample of three counterexamples.
First, an example due to
Michael McDermott (1995). A and B each have a
switch in front of them, which they can move to the left or right. If both
switches are thrown into the same position, a third person C receives
a shock. A does not want to shock C. Seeing B's
switch in the left position, A moves her switch to the
right. B does want to shock C. Seeing A's
switch thrown to the right, she now moves her switch to the right as
well. C receives a shock. Clearly, A's throwing
her switch to the right causes B to throw her switch to the
right, which in turn causes C to receive the shock. But A attempted to prevent the shock so that it
seems unreasonable to say that A's move causes C to
be shocked.
Second, an example due to
Ned Hall (2004). A person is walking along a mountain trail,
when a boulder high above is dislodged and comes careering down the
mountain slopes. The walker notices the boulder and ducks at the appropriate
time. The careering boulder causes the walker to duck and this, in turn, causes
his continued stride. (This second causal link involves double
prevention: the duck prevents the collision between walker and boulder
which, had it occurred, would have prevented the walker's continued stride.)
However, the careering boulder is the sort of thing that would prevent the
walker's continued stride and so it seems counterintuitive to say that it
causes the stride.
Third, an example due to
Douglas Ehring (1987). Jones puts some potassium
salts into a hot fire. Because potassium compounds produce a purple flame when
heated, the flame changes to a purple colour, though
everything else remains the same. The purple flame ignites some flammable
material nearby. Here we judge that putting the potassium salts in the fire
caused the purple flame, which in turn caused the flammable material to ignite.
But it seems implausible to judge that putting the potassium salts in the fire
caused the flammable material to ignite.
Various replies have been
made to these counterexamples. The last counterexample seems the most easily
deflected. For example, Maslen (2004), who endorses the contrast-relativity of causl statements, has argued that this example is
misdiagnosed as a counterexample to transitivity, as the contrast situation at
the effect-end of the first causal statement does not match up with the
contrast situation the cause-end of the second causal statement. Thus, the
first causal statement should be interpreted as saying that Jones's putting
potassium salts in the fire rather not doing so caused the flame to turn
purple rather than yellow; but the second causal statement should be
interpreted as saying that the purple fire's occurring rather than not
occurring caused the flammable material to ignite rather not to
ignite. Where there is a mismatch of this kind, we do not have a genuine
counterexample to transitivity. L. Paul (2004) offers a similar diagnosis of
the last example, though her diagnosis proceeds in terms of event aspects,
which she takes to be causation primary relata. She argues similarly that there
is mismatch between the event aspect that is the effect of the first causal
link (the flame's being a purple colour) and the
event aspect that is the cause of the second causal link (the flame's touching the
flammable material).
The first and second
examples cannot be handled in the same way. Some defenders of transitivity have
replied that our intuitions about the intransitivity of causation in these
examples are misleading. For instance, Ned Hall (2000) has argued that we
should suspect our intuition in the second example because it involves double
prevention, which he claims is not a genuine kind of causation. Thus, he denies
that the walker's ducking caused his continued stride since this holds only by double
prevention.(This also commits him to denying that causal dependence is
sufficient for causation, since the walker's continued stride causally depends
on his ducking the boulder.) He offers a different diagnosis of why our
intuitions go awry in the first example. Lewis (2004a) adopts a similar
strategy of trying to explain away the force of our intuitions in these
examples. He points out that the counterexamples to transitivity typically
involve a structure in which a c-type event generally prevents an e-type
but in the particular case the c-event
actually causes another event that counters the threat and causes the e-event.
If we mix up questions of what is generally conducive to what, with questions
about what caused what in this particular case, he
says, we may think that it is reasonable to deny that c causes e.
But if we keep the focus sharply on the particular case,
we must insist that c does in fact cause e.
The debate about the
transitivity of causation is not easily settled, partly because it is tied up
with the issue of how it is best for a counterfactual theory to deal with
examples of preemption. As we have seen, Lewis's counterfactual theory relies
on the transitivity of causation to handle cases of preemption. If such cases
could be handled in some other way, that would take some of the theoretical
pressure off the theory, allowing it concede the
persuasive counterexamples to transitivity without succumbing to the
difficulties posed by preemption. (For more on this point see Hitchcock 2001.)
As we have seen, Lewis
employs his strategy of defining causation in terms of chains of causal
dependence not only to make causation transitive, but also to deal with
preemption examples. However, there are preemption examples that this strategy
cannot deal with satisfactorily. Difficulties concerning preemption have proven
to be the biggest bugbear for Lewis's theory.
In his (1986c), Lewis
distinguishes cases of early and late preemption.
In early preemption examples, the process running from the preempted
alternative is cut short before the main process running from the preempting
cause has gone to completion. The example of the two assassins, given above, is
an example of this sort. The theory of causation in terms of chains of causal
dependence can handle this sort of example. In contrast, cases of late
preemption are ones in which the process running from the preempted cause is
cut short only after the main process has gone to completion and brought about
the effect. The following is an example of late preemption due to Hall (2004).
Billy and Suzy throw rocks
at a bottle. Suzy throws first so that her rock arrives first and shatters the
glass. Without Suzy's throw, Billy's throw would have shattered the bottle.
However, Suzy's throw is the actual cause of the shattered bottle, while
Billy's throw is merely a preempted potential cause. This is a case of late
preemption because the alternative process (Billy's throw) is cut short after
the main process (Suzy's throw) has actually brought
about the effect.
Lewis's theory cannot
explain the judgement that Suzy's throw was the actual cause of the shattering
of the bottle. For there is no causal dependence between Suzy's throw and the
shattering, since even if Suzy had not thrown her rock, the bottle would have
shattered due to Billy's throw. Nor is there a chain of stepwise dependences
running cause to effect, because there is no event
intermediate between Suzy's throw and the shattering that links them up into a
chain of dependences. Take, for instance, Suzy's rock in mid-trajectory.
Certainly, this event depends on Suzy's initial throw, but the problem is that
the shattering of the bottle does not depend on it, because even without it the
bottle would still have shattered because of Billy's throw.
To be sure, the bottle
shattering that would have occurred without Suzy's throw would be different
from the bottle shattering that actually occurred with
Suzy's throw. For a start, it would have occurred later. This observation
suggests that one solution to the problem of late preemption might be to insist
that the events involved should be construed as fragile events. Accordingly, it
will be true rather than false that if Suzy had not thrown her rock, then the
actual bottle shattering, taken as a fragile event with an essential time and
manner of occurrence, would not have occurred. Lewis himself does not endorse
this response on the grounds that a uniform policy of construing events as
fragile would go against our usual practices, and
would generate many spurious causal dependences. For example, suppose that a
poison kills its victim more slowly and painfully when taken on a full stomach.
Then, the victim's eating dinner before he drinks the poison would count as a
cause of his death since the time and manner of the death depend on the eating
of the dinner. (For discussion of the limitations of this response see Lewis
1986c, 2000.)
When we turn from
preemption examples involving deterministic causation to those involving chancy
causation, we see that the problems for Lewis's theory multiply. One
particularly recalcitrant problem is described in Menzies 1989. (See also
Woodward 1990.) Suppose that two systems can produce the same effect, perhaps
at the same time and in the same manner. (It does not matter whether this is an
example of early or late preemption.) However, one system is much more reliable
than the other. The reliable system starts and, left
to itself, will very probably produce the effect. But you do not leave it to
itself. You throw a switch that shuts down the reliable system and turns on the
unreliable one. As luck would have it, the unreliable system works and brings
about the effect. This kind of example presents a problem for the probabilistic
generalisation of the counterfactual theory because
the preempting actual cause decreases the chance of the effect while the
preempted potential cause increases its chance. In addition to the problem of
explaining how the preempting cause qualifies as a cause when the effect does
not causally depend on it, the probabilistic counterfactual theory faces the
problem of explaining how the preempted cause is not really a cause when the
effect does causally depend on it.(Examples of this kind have been the subject
of extensive discussion in the context of both counterfactual and probabilistic
theories of causation. For discussions about how best to deal with them within
theories admitting of indeterminism,see Barker 2004; Beebee 2004; Dowe 2000, 2004;
Hitchcock 2004; Kvart 2004; Noordhof
1999, 2004; Ramachandran 1997, 2004.)
In this section we shall
consider some recent developments of the counterfactual approach to causation,
which have been motivated by the desire to overcome the deficiencies in Lewis's
1973 theory, especially with respect to preemption.
In an attempt to deal with the various problems facing his 1973
theory, Lewis developed a new version of the counterfactual theory, which he
first presented in his Whitehead Lectures at Harvard University in March 1999.
(A shortened version of the lectures appeared as his (2000). The full lectures
are published as his (2004a).)
Counterfactuals play a
central role in the new theory, as in the old. But the counterfactuals it
employs do not simply state dependences of whether one event
occurs on whether another event occurs. The counterfactuals
state dependences of whether, when, and how one
event occurs on whether, when, and how another
event occurs. A key idea in the formulation of these counterfactuals is that of
an alteration of an event. This is an actualised
or unactualised event that occurs at a slightly
different time or in a slightly different manner from the given event. An
alteration is, by definition, a very fragile event that could not occur at a
different time, or in a different manner without being a different event. Lewis
intends the terminology to be neutral on the issue of whether an alteration of
an event is a version of the same event or a numerically different event.
The central notion of the
new theory is that of influence.
(7) |
Where c and e are distinct
events, c influences e if and only
if there is a substantial range of c1, c2, … of
different not-too-distant alterations of c (including the
actual alteration of c) and there is a range of e1, e2,
… of alterations of e, at least some of which differ, such that
if c1 had occurred, e1 would have occurred, and
if c2 had occurred, e2 would have occurred, and so on. |
Where one event influences
another, there is a pattern of counterfactual dependence of whether, when, and
how upon whether, when, and how. As before, causation is defined as an
ancestral relation.
(8) |
c causes e if
and only if there is a chain of stepwise influence from c to e. |
One of the points Lewis
advances in favour of this new theory is that it
handles cases of late as well as early pre-emption. (The theory is restricted
to deterministic causation and so does not address the example of probabilistic
preemption described in section 3.4.) Reconsider, for instance, the example of
late preemption involving Billy and Suzy throwing rocks at a bottle. The theory
is supposed to explain why Suzy's throw, and not Billy's throw, is the cause of
the shattering of the bottle. If we take an alteration in which Suzy's throw is
slightly different (the rock is lighter, or she throws sooner), while holding
fixed Billy's throw, we find that the shattering is different too. But if we make
similar alterations to Billy's throw while holding Suzy's throw fixed, we find
that the shattering is unchanged.
Another point in favour of the new theory is that it handles a type of
preemption Lewis that have come to be called trumping. (Trumping
was first described by Jonathan Schaffer: see his (2000).) Lewis gives an
example involving a major and a sergeant who are shouting orders at the
soldiers. The major and sergeant simultaneously shout “Advance”; the soldiers
hear them both and advance. Since the soldiers obey the superior officer, they
advance because the major orders them to, not because the sergeant does. So the major's command preempts or trumps the sergeant's.
Where other theories have difficulty with trumping cases, Lewis's argues his
new theory handles them with ease. Altering the major's command while holding
fixed the sergeant's, the soldier's response would be correspondingly altered.
In contrast, altering the sergeant's command, while holding fixed the major's,
would make no difference at all.
There is, however, some
reason for scepticism about whether the new theory
handles the examples of late preemption and trumping completely satisfactorily.
In the example of late preemption, Billy's throw has some degree of influence
on the shattering of the bottle. For if Billy had thrown his rock earlier (so
that it preceded Suzy's throw) and in a different manner, the bottle would have
shattered earlier and in a different manner. Likewise, the sergeant's command
has some degree of influence on the soldiers' advance in that if the sergeant
had shouted earlier than the major with a different command, the soldiers would
have obeyed his order. In response to these points, Lewis must say that these
alterations of the events are too-distant to be considered
relevant. But some metric of distance in alterations is required,
since it seems that similar alterations of Suzy's throw and the major's
command are relevant to their having causal influence.
It has also been argued
that the new theory generates a great number of spurious instances of
causation. (For discussion see Collins 2000; Kvart
2001.) The theory implies that any event that influences another event to a
certain degree counts as one of its causes. But commonsense is more
discriminating about causes. To take an example of Jonathan Bennett (1987):
rain in December delays a forest fire; if there had been no December rain, the
forest would have caught fire in January rather than when it actually
did in February. The rain influences the fire with respect to its
timing, location, rapidity, and so forth. But commonsense denies that the rain
was a cause of the fire, though it allows that it is a cause of
the delay in the fire. Similarly, in the example of the poison
victim discussed above, the victim's ingesting poison on a full stomach
influences the time and manner of his death (making it a slow and painful
death), but commonsense refuses to countenance his eating dinner as a cause
of his death, though it may countenance it as a cause of its
being a slow and painful death. Pace Lewis, commonsense
does not take anything that affects the time and manner of an event to be a
cause of the event simpliciter.
4.2 Causation as Intrinsic Relation
One way of treating
preemption that has been recently discussed departs from a purely
counterfactual analysis of causation. It has been argued that preemption
examples highlight the intuitive idea that causation is an intrinsic relation
between events, which is to say it is a local relation depending on the intrinsic
properties of the events and what goes on between them, and nothing else. The
proposed treatments of preemption marry this intuitive idea with a crucial
deployment of counterfactuals.
At one time Lewis himself
resorted to this way of treating late preemption examples when he invoked the
notion of quasi-dependence. (See his (1986c).) To explain this notion consider a case that resembles the case of Billy and
Suzy throwing rocks at a bottle. Suzy throws a rock and shatters the bottle in exactly the same way in which she does in the original case.
But in this case Billy and his rock are entirely absent. Lewis argued that
since the process in the original case and the process in the comparison case
are intrinsically alike (and also obey the same laws),
both or neither must be causal. However, the comparison process is surely a
causal process since, thanks to Billy's absence, it exhibits a causal
dependence. Accordingly, the process in the original case must be a causal
process too, even though it does not exhibit a causal dependence. In such
examples Lewis has said that the actual process that does not exhibit causal
dependence is, nonetheless, causal by courtesy: it exhibits quasi-dependence in
virtue of its intrinsic resemblance to the causal process in the comparison
case.
A related idea is pursued
in Menzies (1996; 1999). Menzies argues that there is an element in our concept
of causation that resists capture in purely counterfactual terms. This element
consists in the idea that causation is a structural relation that underlies and
supports causal dependences. This idea can be captured by treating the concept
of causation as the concept of a theoretical entity. Applying a standard
treatment of theoretical concepts, he argues that causation should be defined
as the unique occupant of a certain characteristic role given by the platitudes
of the folk theory of causation. One platitude is that causation is an
intrinsic relation between events. Another platitude is that it is typically,
but not invariably, accompanied by causal dependence. Accordingly, causation is
defined in the following way:
(9) |
c causes e if
only if the intrinsic relation that typically accompanies causal dependence
holds between c and e. |
On this account, causation
is not constituted by causal dependence. It is, in fact, a distinct relation
for which causal dependence is, at best, a defeasible marker. The relation may
be identified a posteriori with some physically specificable relation such as energy-momentum transfer. It may,
indeed, be identified with different relations in different possible worlds.
This definition is
supposed to explain commonsense intuitions about preemption examples. For
example, Suzy's throw, and not Billy's throw, caused the shattering of the
bottle, because the intrinsic relation that typically accompanies causal
dependence connects Suzy's throw, but not Billy's throw, with the shattering of
the bottle.
Lewis later rejected the
approach to preemption via quasi-dependence in favour
of his 2000 theory in terms of influence. In Lewis 2004a and 2004b, he claims
that theories of causation as an intrinsic relation do not do justice to the
full range of our intuitions about causation. (For related points see Hall
2002, 2004.) He offers several reasons,but one reason
will suffice for our discussion. The intuition that causation is an intrinsic
matter does not apply to cases of double prevention. Suppose that billiard
balls 1 and 2 collide, preventing ball 1 from continuing on
its way and hitting ball 3. If the collision of balls 1 and 3 had occurred,
ball 3 would not have later collided with ball 4. So, we have double
prevention: the collision of balls 1 and 2 prevented the collision of balls 1
and 3, which would have prevented the later collision of balls 3 and 4. Here it
seems reasonable to say that the collision of balls 1 and 2 was a cause of the
later collision of balls 3 and 4. Lewis observes that the causation in such
cases of double prevention is partly an extrinsic matter. If there had been
some other obstruction that would have stopped ball 1 from hitting ball 3, the
collision of 3 and 4 would not have depended on the collision of 1 and 2.
Moreover, he notes that much of the spatiotemporal region between the collision
of balls 1 and 2 and the collision of balls 3 and 4 is simply empty so that
there is no chain of events to serve as a connecting process between cause and
effect. The intuition that causation is an intrinsic relation does not apply in
this case. More generally, he argues that theories of causation as an intrinsic
relation are overhasty generalisations of one
specific kind of causation, and they fail to do justice to our intuitions about
causation involving absences (as causes, effects or
intermediaries).
4.3 The Structural Equations Framework
A number of contemporary philosophers (Hitchcock 2001, 2007;
Woodward 2003; Woodward and Hitchcock 2003) have explored an alternative
counterfactual approach to causation that employs the structural equations
framework. This framework, which has been used in the social sciences and
biomedical sciences since the 1930s and 1940s, received its state-of-the-art
formulation in Judea Pearl's landmark 2000 book. Hitchcock and Woodward
acknowledge their debt to Pearl's work and to the related work on causal Bayes
nets by Peter Spirtes, Clark Glymour,
and Richard Scheines (1993). However, while Pearl and
Spirtes, Glymour and Scheines focus on issues to do with causal discovery and
inference, Woodward and Hitchcock focus on issues of
the meaning of causal claims. For this reason, their formulations of the
structural equations framework are better suited to
purposes of this discussion. The exposition of this section follows that of
Hitchcock 2001, in particular. While philosophical
work using this framework has only just begun, it would seem
that this framework looks likely to rival Lewis's framework in terms of
its theoretical richness and fruitfulness.
The structural equations
framework describes the causal structure of a system in terms of a causal model
of the system, which is identified as an ordered pair <V, E>,
where V is a set of variables and E a set of
structural equations stating deterministic relations among the variables. (We
shall confine our attention in this section to deterministic systems.) The
variables in V describe the different possible states of the
system in question. While they can take any number of values, in the simple
examples to be considered here the variables are typically binary variables
that take the value 1 if some event occurs and the value 0 if the event does
not occur. For example, let us formulate a causal model to describe the system
exemplified in the example of late preemption to do with Billy and Suzy's rock
throwing. We might describe the system using the following set of variables:
·
BT = 1 if Billy throws a rock, 0 otherwise;
·
ST = 1 if Suzy throws a rock, 0 otherwise;
·
BH = 1 if Billy's rock hits the bottle, 0 otherwise;
·
ST = 1 if Suzy's rock hits the bottle, 0 otherwise;
·
BS = 1 if the bottle shatters, 0 otherwise.
Here the variables are
binary. But a different model might have used many-valued variables to
represent the different ways in which Billy and Suzy threw their rocks, their
rocks hit the bottle, or the bottle shattered.
The structural equations
in a model describe the dynamical evolution of the system being modelled. There
is a structural equation for each variable. The form taken by a structural
equation for a variable depends on which kind of variable it is. The structural
equation for an exogenous variable (the values of which are
determined by factors outside of the model) takes the form of Y = y,
which simply states the actual value of the variable. The structural equation
for an endogenous variable (the values of which are determined
by factors within the model) states how the value of the variable is determined
by the values of the other variables. It takes the form:
Y = f(X1,…, Xn)
What does this structural
equation mean? There are in fact competing interpretations. The interpretation favoured by Woodward and Hitchcock is that the equation for
an endogenous variable encodes a set of counterfactuals of the following form:
If it were the case that X1 = x1, X2 = x2,…, Xn = xn,
then it would be the case that Y = f(x1,…,xn).
As this form of
counterfactual suggests, the structural equations are to be read from right to
left: the antecedent of the counterfactual states
possible values of the variables X1 through
to Xn and the consequent
states the corresponding value of the endogenous variable Y. There
is a counterfactual of this kind for every combination of possible values of
the variables X1 through to Xn. It is important to note that a
structural equation of this kind is not, strictly speaking, an identity that is
equivalent to f(X1,…, Xn) = Y: there is a
right-to-left asymmetry built into the equation. Another important feature of
the structural equations for endogenous variables is that they must be complete
in the sense that the equation for a variable Y must express
the value of Y as a function of all and only the
variables Xi on which it counterfactually depends
given the values of the other variables. A crucial question for those
interested in the semantic and metaphysical foundations of the structural
equations framework is the status of the counterfactuals encoded by the
structural equations. Are they semantically and metaphysically primitive so
that the structural equations are simply a summary of the more basic
counterfactuals? Or are the structural equations themselves to be taken as the
conceptual and metaphysical primitives, with the counterfactuals having a
secondary, derivative status? So far there is no consensus on the best way to
answer these questions.
As an illustration,
consider the set of structural equations that might be used to model the late
preemption example of Billy and Suzy. Given the variables listed above, the
structural equations might be stated as follows:
·
ST = 1;
·
BT = 1;
·
SH = ST;
·
BH = BT & ~SH;
·
BS = SH v BT.
In these equations logical
symbols are used to represent mathematical functions on binary variables: ~X =
1 − X; X v Y = max{X, Y}; X & Y =
min{X, Y}. The first two equations simply state the actual
values of the exogenous variables ST and BT. The
third equation encodes two counterfactuals, one for each possible value
of ST. It states that if Suzy threw a rock, her rock hit the
bottle; and if she didn't throw a rock, her rock didn't hit the bottle. The
fourth equation encodes four counterfactuals, one for each possible combination
of values for BT and ~SH. It states that if Billy
threw a rock and Suzy's rock didn't hit the bottle, Billy's rock hit the
bottle; but didn't do so if one or more of these conditions was not met. The
fifth equation encodes four counterfactuals, one for each possible combination
of values for SH and BH. It states that if one or
other (or possibly both) of Suzy's rock or Billy's
rock hit the bottle, the bottle shattered; but if neither rock hit the bottle,
the bottle didn't shatter.
The structural equations
above can be represented in terms of a directed graph. The
variables in the set V are represented as nodes in the graph.
An arrow directed from one node X to another Y represents
the fact that the variable X appears on the right-hand side of
the structural equation for Y. In this case, X is
said to be a parent of Y. Exogenous variables are
represented by nodes that have no arrows directed towards them. A directed
path from X to Y in a graph is a
sequence of arrows that connect X with Y. The
directed graph of the model described above of Billy and Suzy example is
depicted in Figure 1 below:
Figure 1
The arrows in this figure
tell us that the bottle's shattering is a function of Suzy's rock hitting the
bottle and Billy's rock hitting the bottle; that Billy's rock hitting the
bottle is a function of Billy's throwing a rock and Suzy's rock hitting the
bottle; and that Suzy's rock hitting the bottle is a function of her throwing
the rock. (The existence of an arrow from one variable to another does not
always signify a stimulatory connection. For example, the arrow directed
from SH to BH is inhibitory.)
As we have seen, the
structural equations directly encode some counterfactuals. However, some
counterfactuals that are not directly encoded can be derived from them.
Consider, for example, the counterfactual “If Suzy's rock had not hit the
bottle, it would still have shattered”. As a matter of fact, Suzy's rock did
hit the bottle. But we can determine what would have happened if it hadn't done
so, by replacing the structural equation for the endogenous variable SH with
the equation SH = 0, keeping all the other equations
unchanged. So, instead of having its value determined in the ordinary way by
the variable ST, the value of SH is set
“miraculously”. Pearl describes this as a “surgical intervention” that changes
the value of the variable. In terms of its graphical representation,this
amounts to wiping out the arrow from the variable ST to the
variable SH and treating SH as if it were an
exogenous variable. After this operation, the value of the variable BS can
be computed and shown to be equal to 1: given that Billy had thrown his rock,
his rock would have hit the bottle and shattered it. So
this particular counterfactual is true. This procedure for evaluating
counterfactuals has direct affinities with Lewis's non-backtracking
interpretation of counterfactuals: the surgical intervention that sets the
variable SH at its hypothetical value but keeps all other
equations unchanged is similar in its effects to Lewis's small miracle that realises the counterfactual antecedent but preserves the
past.
In general, to evaluate a
counterfactual, say “If it were the case that X1,…,Xn, then …”, one replaces the original
equation for each variable Xi with a new equation
stipulating its hypothetical value,while keeping the
other equations unchanged; then one computes the values for the remaining
variables to see whether they make the consequent true. This technique of
replacing an equation with a hypothetical value set by a “surgical
intervention” enables us to capture the notion of counterfactual dependence
between variables:
(10) |
A variable Y counterfactually depends on
a variable X in a model if and only if it is actually the case that X = x and Y = y and
there exist values x' ≠ x and y' ≠ y such
that replacing the equation for X with X = x' yields Y = y'. |
How does the structural
equations framework deal with examples of late pre-emption that pose such
problems for Lewis's counterfactual theory? Can this framework deliver the
intuitively correct verdicts in the example about Suzy and Billy? Halpern and
Pearl (2001,2005), Hitchcock (2001),and Woodward (2003a) all give roughly the
same treatment of examples of late preemption. The key to their treatment is
the employment of a certain procedure for testing the existence of a causal
relation. The procedure is to look for an intrinsic process connecting the
putative cause and effect; suppress the influence of their noninstrinsic
suroundings by “freezing” those surroundings as they actually are; and then subject the putative cause to a
counterfactual test. So, for example, to test whether the variable Suzy's
throwing a rock caused the bottle to shatter, we should consider the examine
the process running from ST through SH to BS;
hold fix at its actual value the variable BH which is
extrinsic to this process; and then wiggle the variable ST to
see if it changes the value of BS. The last steps involve
evaluating the counterfactual “If Suzy hadn't thrown a rock and Billy's rock
hadn't hit the bottle, the bottle would not have shattered”. It is easy to see
that this counterfactual is true. In contrast, when we carry out a similar
procedure to test whether Billy's throwing a rock caused the bottle to shatter,we are required to consider the counterfactual “If
Billy hadn't thrown his rock and Suzy's rock had hit the bottle, the bottle
would not shattered”. This counterfactual is false. It
is the difference in the truth-value of these two counterfactuals that explains
the fact that it was Suzy's rock throwing, and not Billy's, that caused the
bottle to shatter.
Hitchcock (2001) presents
a useful regimentation of this reasoning. He defines a route between
two variables X and Z in the set V to
be an ordered sequence of variables <X, Y1,…, Yn, Z>
such each variable in the sequence is in V and is a parent of
its successor in the sequence. A variable Y is intermediate between X and Z if
and only if it belongs to some route between X and Z.
Then he introduces the new concept of an active causal route:
(11) |
The route <X, Y1,…, Yn, Z> is active in
the causal model <V, E> if and only if Z depends
counterfactually on X within the new system of
equations E' constructed from E as follows:
for all Y in V, if Y is
intermediate between X and Z but does not
belong to the route <X, Y1,…, Yn, Z>, then replace the
equation for Y with a new equation that sets Y equal
to its actual value in E. (If there are no intermediate variables
that do not belong to this route, then E' is just E.) |
This definition generalises the informal idea sketched in the example of
Suzy and Billy. There is an active causal route going from Suzy's throwing her
rock through her rock hitting the bottle to the bottle shattering: when we hold
fixed Billy's rock not hitting the bottle, which is the actual value of the only
intermediate variable BH that is not on this route, we see
that the bottle's shattering counterfactually depends on Suzy's throwing her
rock. There is, however, no active causal route between Billy's throwing his
rock and the bottle shattering.
In terms of the notion of
an active causal route, Hitchcock defines actual or token causation in the
following terms:
(12) |
If c and e are distinct actual
events and X and Z are binary variables
whose values represent the occurrence and non-occurrence of these events,
then c is a cause of e if
and only if there is an active causal route from X to Z in
an appropriate causal model <V, E>. |
A crucial notion in this
definition is that of “an appropriate” model. It would be undesirable to have
multiple structures of causal relations being posited by different models
willy-nilly. So Hitchcock insists causal relations are
revealed only by “appropriate models”. He mentions a number
of criteria for appraising whether a model is appropriate, the most
important one being that the structural equations posited by the model must not
imply any false counterfactual. In order to deal with
examples of symmetric overdetermination, Hitchcock (2001) defines a notion
of a weakly active route, the essential idea being that there is a
weakly active route between X and Y just
when Y counterfactually depends on X under
the freezing of some possible, not necessarily actual, values of the variables
that are not on the route from X to Y. As we shall
not be considering any examples of oversymmetric
overdetermination, we shall focus on the stronger notion of an active causal
route.
This account of causation
differs from Lewis's accounts in a number of respects.
One difference is that the account does not appeal to the transitivity of
causation to deal with preemption examples, in contrast to Lewis's accounts,
both early and late. Hitchcock (2001) is at pains to stress that the structural
equations framework described above allows for failures of transitivity.
Another difference between the accounts is that the structural equations
account appeals to special counterfactuals with complex antecedents in order to handle preemption examples. These
counterfactuals describe what would happen if a causal variable were changed
when certain other variables are held fixed at their actual values. (Hitchcock
calls these “explicitly nonforetracking
counterfactuals”.) Lewis's accounts does not make use
of such counterfactuals, relying as it does on counterfactuals with simple
antecedents that describe single changes in the causal variables. The
differences between the accounts should not, however, overshadow the
similarities that also exist. Both accounts make central use of
non-backtracking counterfactuals and they interpret
these counterfactuals in roughly the same fashion. Setting aside complications
to do with backwards causation, Lewis's account and the structural equations
account have us evaluate a non-backtracking counterfactual in much the same
way: we are to hold fixed the past history of the system, imagine that the
antecedent is realised “miraculously” by a surgical
intervention from outside the system, and then consider how the new state of
the system would evolve in conformity with the structural equations or laws of
the system without any further interventions.
How plausible is this new
counterfactual approach to causation? It is too early to say with any
confidence, as the approach is still being developed and it has not been
subjected to sustained, rigorous testing. Nonetheless, some early problems have
emerged. (See Hall 2007; Hitchcock 2007; and Menzies 2004b.) Consider, for
instance, the following example, which is a variant of one described by
Hitchcock (2007). An assassin puts poison in the king's coffee. The bodyguard
responds by pouring an antidote in the king's coffee. If the bodyguard had not
poured the antidote in the coffee, the king would have died. On the other hand,
the antidote is fatal when taken by itself; and if the poison had not been
poured in first, it would have killed the king. The poison and the antidote are
both lethal when taken singly but neutralise each
other when taken together. In fact, the king drinks the coffee and survives.
Suppose we model this
scenario using the following variables:
·
A = 1 if the assassin pours poison into the king's
coffee, 0 otherwise;
·
G = 1 if the bodyguard responds by pouring
antidote into the coffee, 0 otherwise;
·
S = 1 if the king survives, 0 otherwise.
And also suppose that we employ these structural equations:
·
A = 1;
·
G = A;
·
S = (A & G) v (~A & ~G).
The directed graph for
this model is depicted in Figure 2.
Figure 2
Testing for active causal
processes, we can see that the process that goes directly from the assassin's
pouring the poison in the coffee to the king's survival is active. Holding
fixed the fact that the bodyguard poured the lethal antidote into the coffee,
we note that the king would not have survived if the assassin had not put the
poison in the coffee first. So the theory licenses the
verdict that the assassin's pouring in the poison caused the king to survive.
However, many regard this as a mistaken causal verdict: putting poison in the
king's coffee is exactly the kind of thing that is likely to kill the king. It
might be argued that the causal verdict is justified in view
of the fact that the assassin's action caused the bodyguard's action,
which in turn caused the king's survival. But this appeal to the transitivity
of causation is not open to the defenders of this theory, who deny the validity
of transitivity.
One counterexample by
itself is not enough to disprove the whole structural equations framework.
Strictly speaking, it only casts doubt on the theory of causation that defines
causation in terms of the presence of an active causal route. Perhaps there are
alternative definitions within the structural equations framework that fare
better. Indeed, a number of philosophers have explored
the possibility of framing a better theory by appealing to a distinction
between what Hitchcock has called “default” and “deviant” values of
variables.(See Hitchcock 2007.) The default value of some variable represents a
normal or to-be-expected state of the system, whereas a deviant value
represents an abnormal or unusual state of the system. The correlative notion of
the default course of evolution for a system can be characterised
as a temporally-ordered sequence of values that the
variables in a model take when the default values of the exogenous variables
are plugged into the structural equations of the model. Thus, if we set the
value of the exogenous variable A in the example above at its
default value 0 instead of its actual value 1, we can see that the scenario
described above will evolve in the following way: the assassin doesn't put the
poison in the coffee, the bodyguard doesn't put the antidote into the coffee,
and the king survives. Now if we evaluate counterfactual dependences with
counterfactuals centred on the default course of
evolution rather than the actual course of evolution, we can see that the bodyguard's
action counterfactually depends on the assassin's action and the king's
survival depends on the bodyguard's action, but the king's survival doesn't
depend on the assassin's action. If counterfactual dependences centred on the default course of evolution are taken to
indicate causal relations, these counterfactual dependences accurately reflect
our intuitive causal judgements. (For further discussion of this idea, see
Menzies 2004a, 2004b, 2007.) It remains to be see
whether the various attempts to augment the structural equations framework with
a distinction between default and deviant values are successful or not. (For
other attempts see Hall 2007; Hitchcock 2007; and for discussion of the role of
the default/deviant distinction in causal judgements see Maudlin 2004.)
[Please contact the author
with suggestions.]