In his analysis of causal explanation in the social, medical and physical sciences, the philosopher Paul Humphreys begins by asserting "that an inviolable requirement of a satisfactory scientific explanation is that it be true." Truth is judged by whether the explanation provides understanding of how the world works. He also takes for granted "that scientific methods, especially experimental methods, have been successful in discovering at least some of the causes that operate in the world, and that they are more successful at this than are unsystematic, nonexperimental ones." (Humphreys 1989: 1-2)
Just because an explanation has been derived by following faithfully the guidelines of a traditionally accepted methodology does not mean that it is true, that it expresses the way the world works. We are, throughout the social, medical and physical sciences, often overwhelmed by prescriptions for how data should be organized so that it can be properly shaped into information and generalized into causal knowledge. (The data-information-knowledge distinction is due to Daniel Bell, according to Paul Starr (1987).) Humphreys, in answering the question "How do we verify that a causal proposition is true?" describes the properties that an acceptable experimental analysis requires in the way of a priori causal knowledge:
Thinking of an experiment or other scientific research study in isolation is, from this point of view, ludicrous. Causal theories must always form the basis for the derivation of new causal knowledge. In Nancy Cartwright's (1989) phrase, "No Causes In, No Causes Out."
Imagine an intervention which is supposed to increase the value of an individual to a free enterprise society. The intervention may be educational, social or economic, but it must be considered as something that can be offered, potentially, to a very large segment of the population. What will be needed as evidence of the efficacy of the intervention, to demonstrate that it is likely to have the desired result? Clearly we need
How should the subjects of the study be chosen? Ideally they should be representative of the population to which the intervention may eventually be directed. Random sampling is one textbook answer to the problem of achieving representativeness, providing a pool of subjects which "on the average" (over repeated samples) should have characteristics similar to those of the target population. Very few studies addressed to large diverse populations actually use random samples, and not many more attempt to demonstrate the representativeness of their subject pools. Instead, evaluations of medical, psychological and social interventions rely on volunteers or convenience samples, whether they are carried out in laboratory or field settings. In their groundbreaking monograph Campbell and Stanley (1966) used the term "external validity" to refer to the extent to which results of a study could be generalized by using the evidence of the study itself. Laboratory experiments as well as field experiments on volunteer subjects do not possess much external validity. Generalizing their conclusions requires the researcher to call on facts, assumptions,other causal propositions, or "common sense" outside the study itself.
Isolating the intervention itself may be problematic: in social experiments extraneous factors may confound the explanatory variable. If training is provided to individuals in a series of Saturday workshops, the act of coming to the workshops is confounded with their content. Even if the intervention has been well-defined, appropriate timing of the measurements must be determined, and assurance provided that the measurement process does not become a part of the intervention. Measurement of an outcome variable is not always straightforward. In the example I have proposed consider the very real problem of defining the value of the individual to society. Will income or earnings or some psychometric measure be the measured outcome?
The most straightforward analysis of effectiveness involves before and after measurement with the same "instrument", so that their difference provides an estimate of the effectiveness of the intervention. It may be the case, however, that the same instrument cannot be used before and after the intervention. It may also be the case that the time elapsed between measurements is so great that some changes may be reasonably attributed to the aging or maturing of the organism. When these and other related criticisms of a before/after study are raised, the usual solution is to introduce a control group, and to randomly assign subjects to control or intervention status.
Randomization, the random assignment of persons to the two groups by the researcher, enables the statistician to say that any pre-existing differences between the groups will be averaged out (over repetitions of the experiment). It also can help ameliorate the problem of confounding the intervention mentioned above, if the members from the control group are treated in all ways but one like the others. In my example, the control group might attend meetings that were devoid of any relevant training content. In the jargon of biomedical research they get the placebo, the pill that looks and tastes exactly the same as the treatment but is known to have a neutral effect on the body. In a blinded study a subject would not be aware of which group he belongs to.
In a true experiment the researcher has control over who gets the intervention and uses randomization to make assignments. In a true randomized field experiment the pool of subjects is selected randomly from a well defined target population. The test of the Salk polio vaccine in the U.S. in the 1950s is discussed in many textbooks, but it is almost unique in size, scope, and concern with randomization and blinding. Compromises in these rules result in studies that are weaker in their ability to provide unequivocal conclusions about the value of the intervention. Furthermore, rules of evidence are socially defined and can be seen to change from one era to another, from one culture to another, and in our own from one discipline to another.
In the previous section I have tried to show how theoretical or causal knowledge affects the design of an experimental study. When scientists attempt to draw causal conclusions from non-experimental research the dependence on prior knowledge is even greater. It is often claimed that path analysis or its more ambitious cousin, structural equations analysis, is a method for deriving causal knowledge from nonexperimental data. These techniques are closely related to multiple regression analysis, which was applied for such purposes as early as 1900. Here, however, I will begin with the work of Paul Lazarsfeld and Herbert Simon at mid-century.
Lazarsfeld's paper, "Interpretation of Statistical Relations as a Research Operation" (Lazarsfeld and Rosenberg, 1955: 115-125) derives from a 1946 presentation, and was very influential among survey researchers. In it he explains the method of "elaboration", which he claimed was "a research operation familiar to every research laboratory." This was the technique of examining the relationship between two variables through a crosstabulation, then "breaking down" that relationship by controlling for categories of a third variable, and by extension an arbitrary number of variables.
The original relationship is implicitly considered to be potentially understandable as a causal one, with one of the variables viewed as the response, the other as the explanatory variable. In his initial example, a person's age (under 40, over 40) appears to affect the likelihood of listening to religious programs on the radio. (All variables are considered categorical in Lazarsfeld's presentation.) He notes that since "every research man knows that age is related to education" it is natural to look at the original relationship among educationally defined subgroups: high school graduates vs. those who had not graduate from high school. Of course all survey analysts know that just about anything can happen when a third variable is controlled like this! That is, the relationship might remain the same in both groups, it might turn out to be quite different, or it might disappear altogether.
Lazarsfeld introduced a formula which related the original relationship between the independent and dependent variables (X and Y) to the two conditional relationships controlling for the "test factor" T, and to the relationships of T to X and Y. His goal was to instruct the reader in how to reason, how to interpret certain patterns in these quantities (statistically speaking they were covariances) especially when one or more of the relationships was zero or nearly so. To do so required him to bring in one additional piece of knowledge, the position of T in a time-sequence relative to X and Y. Today we would use the term "causal order" to refer to this time-sequence (Davis, 1985). If, for example, T was judged to be antecedent to both X and Y and if the conditional relationships between X and Y controlling for T both vanished, the original relationship would be judged to be spurious. On the other hand if T was a consequence of X then the same numerical evidence would serve to identify it as a mediating factor that adds to our understanding of how X acts on Y.
Only in Lazarsfeld's penultimate paragraph does the word "causal" enter:
One final point can be cleared up, at least to a certain degree, by this analysis. We can suggest a clearcut definition of the causal relationship between two attributes. If we have a relationship between "x" and "y" ; and if for any antecedent test factor the partial relationships between x and y do not disappear, then the original relationship should be called a causal one. (pp. 124-125)
A "clearcut" definition, perhaps, but one which requires two causal assumptions: (1) that we know and have measured any antecendent test factor that might make the X,Y relationship disappear; and (2) that the time sequence of all variables is unambiguously known. On this last point Lazarsfeld is quite optimistic: "As a matter of principle, it is always possible to establish the time sequence of variables. Progress in research consists in getting this point straightened out."
Simon began his 1954 article in the Journal of the American Statistical Association as follows:
Even in the first course in statistics, the slogan "Correlation is no proof of causation!" is imprinted firmly in the mind of the aspiring statistician or social scientist. It is possible that he leaves the course (and many subsequent ones) with no very clear ideas as to what is proved by correlation, but he never ceases to be on guard against "spurious" correlation, that master of imposture who is always representing himself as "true" correlation. (Simon, 1971 (1954):5)
Like Lazarsfeld, he describes how every researcher proceeds to "ordinarily make causal inferences from data on correlations." Since Simon's "research man" seems to be more psychometrically trained than Lazarsfeld's survey analyst, his variables are quantitative and probably are thought to have a multivariate normal distribution. As a result, his procedure consists of calculating the partial correlation of between X and Y, given Z, using the classic formula. If the partial is zero, and if Z is antecedent to both X and Y, then the correlation was spurious.
ASIDE: When variables are distributed according to the multivariate normal density, they have some very interesting properties. In particular, the conditional correlation between X and Y at a particular value of Z does not depend on the value of Z. It can be calculated as follows from the correlations:
rxy.z = (rxy - rxz ryz)/sqrt((1 - rxz2)(1 - ryz2))
This property is in marked contrast to what survey researchers are used to finding when they control for a third variable: the conditional associations in the subgroups are often quite different from each other. It is why it is useful to keep distinct the notions of "partial" and "conditional" correlations. (Cf. Vargha, Rudas, Delaney and Maxwell, 1996.)
Simon then shows how to formalize the "common sense" notion of time order by writing down a set of regression equations expressing relations among the variables. The residuals or error terms in these equations are assumed to be uncorrelated with each other, and the assumed time sequence of events leads to specifications that some coefficients in these equations are zero a priori. His italicized conclusion is almost identical to that of Lazarsfeld:
Hence correlation is proof of causation in the two-variable case if we are willing to make the assumptions of time precedence and non-correlation of the error terms.
It would be wise to add to this statement the qualifier that "all relevant variables have been included in the system," just as Lazarsfeld required that "any antecedent test factor" be ruled out as a cause. But wiser even than that would have been to soften the claim to have provided a proof of causation or a solution to the problem of causality. Whether a system of variables in any one study is closed, in the sense that all antecedent causes of X and Y have been included, is questionable in any non-experimental research. If, in an analysis of the type discussed by Simon and Lazarsfeld, the partial relationship fails to vanish then a narrower conclusion should be adopted: "we have not proved that the relationship between X and Y is spurious." This statement relies on fewer a priori assumptions, on less implicit causal knowledge, than the one most researchers want to make: "we have proved that the relationship is not spurious." As in other areas of statistical thinking, the position of the word "not" is critical!
Neither Lazarsfeld or Simon were overly concerned with the niceties of statistical inference. The logic of the process was sufficient: a zero partial (or conditional) correlation meant that the original relationship was spurious (assuming the appropriate causal order of the variables). If we use the terminology of hypothesis testing, however, we must raise some other issues. What significance level should be used to decide whether to reject the null hypothesis? Is repeated random sampling, or replicable randomized assignment in experiments, really possible? Is our dataset really one out of an infinite set of equally likely possibilities? How probable is it, given the evidence, that a causal relationship exists? Gerd Gigerenzer (1993) has used a Freudian metaphor that identifies this last question as the one that the researcher's Id wants to have answered, but which is "censored" by the Superego's emphasis on the importance of technical assumptions (which fail to be satisfied in some way in all research) and by the Ego's pragmatic presentation of p values.
It appears to be easy to confuse the truth of a statement with the quality of evidence provided in support of it, for over and over we find questions of substantive truth being passed over in favor of critiques of the methodology used to provide evidence. If, for instance, the only direct evidence for the efficacy of an intervention comes from non-experimental studies (as in the earliest reports of linkages between cigarette smoke and lung cancer), the claim may be rejected out of hand by some on the grounds that "correlation is not causation." Despite the fact that much (perhaps most) of what we consider scientific knowledge was not "proven" through the medium of a randomized experiment, it is still common to find opponents of a claim asserting that only evidence from an experiment satisfy them.
Paul Brodeur has written extensively on the risks associated with low-level radiation such as that produced by power lines, electric substations and computer terminals (CRTs). In one essay in The New Yorker (July, 1994) he quotes a utility spokesman commenting on an clustering of brain tumor cases in people living in the vicinity of a substation:
It's a terribly unfortunate coincidence, .... If it's electromagnetic fields we're talking about, we are aware of studies that show a correlation only. They don't demonstrate cause and effect.
The spokesman's subtext, of course, is that in the absence of a "demonstration" of cause and effect, we ought to believe in the absence of cause and effect. The presence of a correlation, i.e., an observed regularity of relationship between two variables, is almost used as if it were proof of lack of a causal relationship! An attitude like this is displayed by Birnbaum and Mellars (1992) in their discussion of self-selected samples. "It's easy to be fooled by Mother Nature", and "correlation is the tool of the devil" are two of their warnings to researchers who propose to use models to aid in their data analysis efforts.