Broadly
conceived **social work research** is
any process by which
information is
systematically gathered for the purpose of answering questions, examining
ideas, or testing theories related to social work practice.

A **variable** is any research concept that has two or
more values. The **values ****or
****the attributes **of
a variable are the different categories or ordered units that a variable can
take. The **independent variable** predicts or explains other
variables. In an experiment, the independent variable is the variable that is
manipulated. The outcome of this manipulation is determined by measuring the
dependent variable. A **dependent **is an outcome variable whose
values are related to the values of the independent variable. **Data** are generally recorded values of
variables. **Quantitative variables**
take numerical values whose "size" is meaningful. Quantitative
variables answer questions such as "how many?" or "how
much?" It makes sense to add, subtract, and to compare the magnitudes of
two peoples' weights, or two families' incomes. Quantitative variables usually
have measurement units, such as pounds, dollars, years, *etc*.

Some variables, such as social security
numbers or zip codes, take numerical values, but are not quantitative---they
are **qualitative or categorical**. The
sum of two zip codes or social security numbers is not meaningful. The average
of a list of zip codes is not meaningful. Qualitative or categorical variables
typically do not have units. Qualitative or categorical variables such as
gender, hair color, or ethnicity group individuals, have neither a
"size" nor, typically, a natural ordering to their values. They
answer questions such as "which kind?" The values such variables take
are typically adjectives (red, medium, hot). Arithmetic with qualitative
variables, even ones that take numerical values, does not usually make sense.
Categorical variables divide individuals into categories, such as gender,
ethnicity, age group, or whether or not the individual finished high school. Note
that it is common to ** code** categorical and qualitative
variables by numbers, for example, 1 for male and 0 for female.

It is sometimes useful to divide quantitative
variables further into **discrete** and **continuous variables**. (This division is
sometimes rather artificial.) The set of possible values of a discrete variable
is countable. Examples of discrete variables include ages measured to the
nearest year, the number of people in a family, and stock prices on the New
York Stock Exchange (although that might change soon). In the first two of
these examples, the variable can take only some positive integers as values. In
all three examples, there is a minimum spacing between the possible values.
Most discrete variables are like this--they are "chunky." Variables
that count things are always discrete.

Examples of continuous variables include
things like the exact ages or heights of individuals, the exact temperature of
something, *etc**.* There is no minimum spacing between
the possible values of a continuous variable. The possible values of discrete
variables don't necessarily have a minimum spacing. (Technical note: this is
because the set of fractions (rational numbers) is countable, but there is no
minimum spacing between fractions.) One reason the distinction between discrete
and continuous variables is somewhat vague is that in practice there is always
some limited precision to which we can measure any variable, depending on the
instrument we are using to make the measurement. For most purposes, the
distinction is not important.

A **correlational****
relationship** is one in which two variables perform in a synchronized
manner. Economists have assumed that there is a correlation between inflation
and unemployment. When inflation is high, unemployment also tends to be high.
When inflation is low, unemployment also tends to be low. The two variables are
correlated. But, knowing that two variables are correlated does not tell us
whether one causes the other. What if, for instance, we observe a correlation
between the number of roads built in

**Models
**are approximations or representations
of reality. Many models are extremely simple and require unrealistic
assumptions. But, if we are to judge by historical developments in the natural
sciences, it is best to begin with relatively simple models and assumptions
that can then be gradually modified and made more complex.

A
commonly accepted position is that science contains two distinct languages or
ways of defining concepts, which are referred to as the theoretical and
operational languages. There appears to be no purely logical way of bridging
the gap between these languages.

Reality,
or at least the perception of reality, consists of ongoing processes. No two
events are ever exactly repeated, nor does any object or organism remain
precisely the same from one moment to the next. And yet, if we are ever to
understand the nature of the real world, we must act as though events are repeated
and as if objects do have properties that remain constant for some period of
time, however short. Unless we permit ourselves to make such simple
assumptions, we shall never be able to generalize beyond the single and unique
event.

The
dilemma is to select models that are at the same time simple enough to permit
us to think with the aid of the model but also sufficiently realistic that the
simplifications required do not lead to predictions that are highly inaccurate.
Put differently, the basic dilemma faced in all sciences is that of how much to
simplify reality.

According to Bunge (1959)
[Mario Bunge, __Causality__. **cause** is the idea of “producing”--a notion
that seems similar to that of forcing. If X is a cause of Y, then a change in X
is followed by, or associated with, a change in Y. Although the idea of
constant conjunction is part of a definition of causality, conjunction is not
sufficient to distinguish a causal relationship from other types of
association. For example, day is always followed by night, childhood by
adolescence, but we do no think of the first phenomenon in each pair as a cause
of the second. The idea of production or forcing is absent.

An
empiricist objection to the idea that certain causes involve a producing or
forcing phenomenon is that these forces cannot be observed or measured. The
notion of prediction is often used, particularly in the statistical literature,
to avoid this empiricist objection to causal terminology. But, the substitution
of predictive for causal ideas yields its own difficulties. For example, the
substitution of predictive for causal ideas does not permit (limits?) the
ability to think theoretically, and often does not allow adequately for
asymmetrical relationships.

Phillip Frank (1961) [__Modern Science and its
Philosophy__.

It
is because of this hypothetical nature of causal laws that they can never be
tested empirically, in the strictest sense of the word. Since it is always
possible that some unknown forces may be operating to disturb a given causal
relationship or to lead us to believe a causal relationship exists when in fact
it does not, the only way we can make causal inferences is to make simplifying
assumptions about these “disturbing” influences.

While
we might try to produce a change in Y, we cannot be sure that it is our X that
did this. We must assume either that all other causes of Y have literally been
held constant, or, that if not held constant, the effects of these variables
can safely be ignored. Following Leslie Kish (1959) [Some statistical Problems
in Research Design, __American Sociological Review__, __24__(2), 328-38],
it is useful to distinguish among **four types of variables that are capable
of producing changes in y**. First, there is the particular independent
variable (or variables) with which we directly concerned. Second, there may be
a number of variables that are potential causes of Y but that do not vary in
the experimental situation. Presumably, many of these variables have been
brought under control and are known to the investigator.

The
third class of variables consists of all variables that are not under control
and that produce changes in Y during the course of the experiment, but that
have effects on Y that are unrelated to those of X, the independent variable
under consideration.

The fourth type involves variables whose effects are systematically
related to those of the independent variable X, so that the influences of these
variables will be confounded with those of X. One way for a confounding
influence to operate would be for such a variable to be a cause of both X and Y
(i.e., have a spurious effect). In the ideal experiment there would be no
variables of types 3 and 4, and presumably, any changes in Y could be ascribed
to changes in X. Kish’s type 3 and type 4 variables are more commonly called **confounding
variables** (also known as **extraneous** or **lurking**, and sometimes
simply called a confound).

A confounding variable is external to a statistical model, and affects
both the response and/or predictor variables, and makes causal assessments
dubious. **For example, consider a study
in which a new drug for the control of hypertension (high blood pressure) is
being investigated. We set up a trial in which two groups are compared, one
group with the drug, another with a placebo. When we look at the data we find
the group receiving the drug has lower blood pressure than the control group.
However, we notice that the average age of the group receiving the drug is
lower than that of the control group. Hypertension is age- related, and therefore,
the difference in blood pressure between the two groups might be a result of
age differences rather than the effect of the drug. Age differences, then,
could have confounded the findings.**** ****An effective way to control for potential
confounding factors is good experimental design.**

A **causal relationship** exists between two variables when (1) they
perform in a synchronized manner (there is a conjunction), (2) a change in the
value of one precedes a change in the value of the other, and (3) the relationship
is not **spurious** (i.e., the change in each variable is __solely__ the
result of a change in a third variable). In the aforementioned example, it may
be that there is a third variable that is causing both the building of roads
and the birthrate that is causing the correlation we observe. For instance,
perhaps the world economy is responsible for both. When the economy is good,
more roads are built in

In contrast to spuriousness (sometimes termed a pseudo-relationship),**
interaction **occurs when there is a relationship between variables, but the
strength or the direction of the association between these variables depends on
the value of a third. Alternatively, two variables are said to **interact**
when they affect the dependent variable in non-additive fashion. Occasionally
used synonyms include **modifier** and **moderator**.

Pat
Dattalo--August 2002