## Variables

Broadly conceived social work research is any process by which  information  is systematically gathered for the purpose of answering questions, examining ideas, or testing theories related to social work practice.

A variable is any research concept that has two or more values. The values or the attributes of a variable are the different categories or ordered units that a variable can take. The independent variable predicts or explains other variables. In an experiment, the independent variable is the variable that is manipulated. The outcome of this manipulation is determined by measuring the dependent variable. A dependent is an outcome variable whose values are related to the values of the independent variable. Data are generally recorded values of variables. Quantitative variables take numerical values whose "size" is meaningful. Quantitative variables answer questions such as "how many?" or "how much?" It makes sense to add, subtract, and to compare the magnitudes of two peoples' weights, or two families' incomes. Quantitative variables usually have measurement units, such as pounds, dollars, years, etc.

Some variables, such as social security numbers or zip codes, take numerical values, but are not quantitative---they are qualitative or categorical. The sum of two zip codes or social security numbers is not meaningful. The average of a list of zip codes is not meaningful. Qualitative or categorical variables typically do not have units. Qualitative or categorical variables such as gender, hair color, or ethnicity group individuals, have neither a "size" nor, typically, a natural ordering to their values. They answer questions such as "which kind?" The values such variables take are typically adjectives (red, medium, hot). Arithmetic with qualitative variables, even ones that take numerical values, does not usually make sense. Categorical variables divide individuals into categories, such as gender, ethnicity, age group, or whether or not the individual finished high school. Note that it is common to code categorical and qualitative variables by numbers, for example, 1 for male and 0 for female. The fact that a category is labeled with a number does not make the corresponding variable quantitative. The real issue is whether arithmetic with the values makes sense.

It is sometimes useful to divide quantitative variables further into discrete and continuous variables. (This division is sometimes rather artificial.) The set of possible values of a discrete variable is countable. Examples of discrete variables include ages measured to the nearest year, the number of people in a family, and stock prices on the New York Stock Exchange (although that might change soon). In the first two of these examples, the variable can take only some positive integers as values. In all three examples, there is a minimum spacing between the possible values. Most discrete variables are like this--they are "chunky." Variables that count things are always discrete.

Examples of continuous variables include things like the exact ages or heights of individuals, the exact temperature of something, etc. There is no minimum spacing between the possible values of a continuous variable. The possible values of discrete variables don't necessarily have a minimum spacing. (Technical note: this is because the set of fractions (rational numbers) is countable, but there is no minimum spacing between fractions.) One reason the distinction between discrete and continuous variables is somewhat vague is that in practice there is always some limited precision to which we can measure any variable, depending on the instrument we are using to make the measurement. For most purposes, the distinction is not important.

A correlational relationship is one in which two variables perform in a synchronized manner. Economists have assumed that there is a correlation between inflation and unemployment. When inflation is high, unemployment also tends to be high. When inflation is low, unemployment also tends to be low. The two variables are correlated. But, knowing that two variables are correlated does not tell us whether one causes the other. What if, for instance, we observe a correlation between the number of roads built in Europe and the number of children born in the United States? Does that mean that if we want fewer children in the U.S. we should build fewer roads in Europe? Or, does it mean that if there are too few roads in Europe we should encourage U.S. citizens to have more babies? I hope not. While there is a relationship between the number of roads built and the number of babies born, we cannot conclude that the relationship is a causal one.

Models are approximations or representations of reality. Many models are extremely simple and require unrealistic assumptions. But, if we are to judge by historical developments in the natural sciences, it is best to begin with relatively simple models and assumptions that can then be gradually modified and made more complex.

A commonly accepted position is that science contains two distinct languages or ways of defining concepts, which are referred to as the theoretical and operational languages. There appears to be no purely logical way of bridging the gap between these languages.

Reality, or at least the perception of reality, consists of ongoing processes. No two events are ever exactly repeated, nor does any object or organism remain precisely the same from one moment to the next. And yet, if we are ever to understand the nature of the real world, we must act as though events are repeated and as if objects do have properties that remain constant for some period of time, however short. Unless we permit ourselves to make such simple assumptions, we shall never be able to generalize beyond the single and unique event.

The dilemma is to select models that are at the same time simple enough to permit us to think with the aid of the model but also sufficiently realistic that the simplifications required do not lead to predictions that are highly inaccurate. Put differently, the basic dilemma faced in all sciences is that of how much to simplify reality.

According to Bunge (1959) [Mario Bunge, Causality. Cambridge: Harvard University, pp. 46-48.] One essential ingredient in the scientist’s conception of a cause is the idea of “producing”--a notion that seems similar to that of forcing. If X is a cause of Y, then a change in X is followed by, or associated with, a change in Y. Although the idea of constant conjunction is part of a definition of causality, conjunction is not sufficient to distinguish a causal relationship from other types of association. For example, day is always followed by night, childhood by adolescence, but we do no think of the first phenomenon in each pair as a cause of the second. The idea of production or forcing is absent.

An empiricist objection to the idea that certain causes involve a producing or forcing phenomenon is that these forces cannot be observed or measured. The notion of prediction is often used, particularly in the statistical literature, to avoid this empiricist objection to causal terminology. But, the substitution of predictive for causal ideas yields its own difficulties. For example, the substitution of predictive for causal ideas does not permit (limits?) the ability to think theoretically, and often does not allow adequately for asymmetrical relationships.

Phillip Frank (1961) [Modern Science and its Philosophy. New York: Collier Books, chapter 1.], argued that causal laws are essentially working assumptions or tools of the scientist rather than verifiable statements about reality. The scientist, then, assumes causal laws. When they appear violated, they are reformulated to account for existing facts. A causal relationship between two variables cannot be evaluated empirically unless we can make certain simplifying assumptions about other variables. Causal statements or laws, then, are purely hypothetical. They are of the if-then form. If a system is isolated, or if there are no other variables operating, then a change in X produces a change in Y.

It is because of this hypothetical nature of causal laws that they can never be tested empirically, in the strictest sense of the word. Since it is always possible that some unknown forces may be operating to disturb a given causal relationship or to lead us to believe a causal relationship exists when in fact it does not, the only way we can make causal inferences is to make simplifying assumptions about these “disturbing” influences.

While we might try to produce a change in Y, we cannot be sure that it is our X that did this. We must assume either that all other causes of Y have literally been held constant, or, that if not held constant, the effects of these variables can safely be ignored. Following Leslie Kish (1959) [Some statistical Problems in Research Design, American Sociological Review, 24(2), 328-38], it is useful to distinguish among four types of variables that are capable of producing changes in y. First, there is the particular independent variable (or variables) with which we directly concerned. Second, there may be a number of variables that are potential causes of Y but that do not vary in the experimental situation. Presumably, many of these variables have been brought under control and are known to the investigator.

The third class of variables consists of all variables that are not under control and that produce changes in Y during the course of the experiment, but that have effects on Y that are unrelated to those of X, the independent variable under consideration.

The fourth type involves variables whose effects are systematically related to those of the independent variable X, so that the influences of these variables will be confounded with those of X. One way for a confounding influence to operate would be for such a variable to be a cause of both X and Y (i.e., have a spurious effect). In the ideal experiment there would be no variables of types 3 and 4, and presumably, any changes in Y could be ascribed to changes in X. Kish’s type 3 and type 4 variables are more commonly called confounding variables (also known as extraneous or lurking, and sometimes simply called a confound).

A confounding variable is external to a statistical model, and affects both the response and/or predictor variables, and makes causal assessments dubious. For example, consider a study in which a new drug for the control of hypertension (high blood pressure) is being investigated. We set up a trial in which two groups are compared, one group with the drug, another with a placebo. When we look at the data we find the group receiving the drug has lower blood pressure than the control group. However, we notice that the average age of the group receiving the drug is lower than that of the control group. Hypertension is age- related, and therefore, the difference in blood pressure between the two groups might be a result of age differences rather than the effect of the drug. Age differences, then, could have confounded the findings. An effective way to control for potential confounding factors is good experimental design.

A causal relationship exists between two variables when (1) they perform in a synchronized manner (there is a conjunction), (2) a change in the value of one precedes a change in the value of the other, and (3) the relationship is not spurious (i.e., the change in each variable is solely the result of a change in a third variable). In the aforementioned example, it may be that there is a third variable that is causing both the building of roads and the birthrate that is causing the correlation we observe. For instance, perhaps the world economy is responsible for both. When the economy is good, more roads are built in Europe, and more children are born in the U.S. The key lesson here is that you have to be careful how you interpret a correlation. If you observe a correlation between the number of hours students use the computer to study, and their grade point averages (with high computer users getting higher grades), you cannot assume that the relationship is causal. In this case, the third variable might be socioeconomic status -- richer students who have greater resources at their disposal tend to both use computers and do better in their grades. It's the resources that drive both use and grades, not computer use that causes the change in the grade point average.

In contrast to spuriousness (sometimes termed a pseudo-relationship), interaction occurs when there is a relationship between variables, but the strength or the direction of the association between these variables depends on the value of a third. Alternatively, two variables are said to interact when they affect the dependent variable in non-additive fashion. Occasionally used synonyms include modifier and moderator.

Pat Dattalo--August 2002