As a summary of some topics that may have been overlooked in class, here are a few interesting facts about R-square related concepts.

**R-squared**, often called the coefficient of determination, is
defined as the ratio of the sum of squares explained by a regression model
and the "total" sum of squares around the mean

*R ^{2 }= 1 - SSE / SST*

in the usual ANOVA notation. Most people refer to it as the proportion
of variation explained by the model, but sometimes it is called the proportion
of variance explained. This is misleading because SST is not the ** variance**of
Y. In sample terminology, variances are "mean squares." Thus the estimated
variance of Y is

Regression analysis programs also calculate an "adjusted" R-square. The best way to define this quantity is:

*R ^{2}_{adj} = 1 - MSE / MST*

since this emphasizes its natural relationship to the coefficient of determination.

While R-squared will never increase when a predictor is dropped from
a regression equation, the adjusted R-squared *may* be larger. Specifically,
if the** t-ratio for a predictor **is less than one, dropping
that predictor from the model will increase the adjusted R-squared. Sometimes
you will come across an article in which the researcher keeps everything
with a t bigger than 1 in the model. The motivation for doing that is to
get as large an adjusted R-squared as possible. Note that the

Here is the traditional formula for expressing the adjusted R-squared in terms of the ordinary R-squared. It shows explicitly the "adjustment" process, and also demonstrates that the adjusted R-squared is always smaller:

In contrast to the conventions described above for regression analysis of non-experimental data, it is not standard practice to report the percentage of variance explained in a designed experiment. R-squared can easily be calculated from any ANOVA table, of course:

**R-squared = SS(Between Groups)/SS(Total)**

The Greek symbol "** Eta-squared"** is sometimes used to denote
this quantity. Of course the calculation of the coefficients is identical
despite the different terminology, as is obvious when the definition is
written in terms of the

**R-squared = 1 - SS(Error)/SS(Total)**

Note that ** Eta **
is reported if you use the

Suppose, for instance, that an experimental intervention really increases
response variable **Y **by 10 points on the average, with a standard
deviation of 2 points. Imagine a simple experiment where **n** subjects
get the intervention and a multiple **kn **do not, and let n be large
so I can ignore sampling error. Then it works out that the value of ** Eta-squared**
is equal to:

**Eta-squared = **

When the treatment and control groups are of equal size (k=1) the Eta-squared is 25/29, and this is its maximal value. If the two groups differ greatly in size, say with k = 10, Eta-squared is smaller, only 25/37.1. The phenomenon is the same, the effect of treatment in average points gained is the same, but the correlation coefficient Eta is not the same.

This example is one in which the independent variable is dichotomous, the classic treatment-control experiment. Experiments can be done with a continuous independent variable, for instance where X is the dosage in a drug study. The experimenter may then assign cases to different X values as she sees fit. If we suppose that there is really a linear relationship between dosage X and outcome Y on the average, with random residuals that have a standard deviation ,it would be appropriate to do a regression analysis and R-squared would be automatically calculated by the computer program. Again, however, it can be shown that the researcher's decision on what X values to use will affect the value of "the proportion of variation explained by the model." If cases are placed at extreme values (say half the subjects have very low X, half very high) the R-squared will be larger than if they are close together.

This characteristic of the Pearson correlation was known to the ancients.
Charles Spearman (the grandfather of 20^{th} century psychometrics),
in his 1904 paper on intelligence, described it as the problem of ** attenuation**
of the correlation coefficient. Attenuation arises in experiments and in
observational studies when the sample is selected from restricted ranges
of the independent variable X rather than strictly at random. Adjusting
for attenuation is a standard topic in psychological statistics texts.
Sociologists are more likely to think of their samples as "representative"
of the population on all variables, and therefore pay little attention
to the issue. The sample r or Multiple R will not be a good estimate of
the corresponding population parameter if the sample is (deliberately or
accidentally) biased.

The 1981 reader by Peter Marsden (** Linear Models in Social Research)
**contains
some useful and readable papers, and his introductory sections deserve
to be read (as an unusually perceptive book reviewer noted in the journal

Changes of scale are trivial in one sense, for they do not affect the
underlying reality or the degree of fit of a linear model to data. Choosing
to measure distance in meters rather than feet is a matter of taste or
convention, not a matter for the theoretical physicist or statistician
to worry about. **But since such changes affect the values of numbers,
they may have an impact on a naive researcher **whose goal is to evaluate
"the relative importance of different explanatory variables" or "the relative
importance of a given variable in two or more different populations" (Marsden,
p. 15). While there are an infinite number of ways to change scales of
measurement, the standardization technique is the one most often adopted
by social and behavioral scientists. The standardized regression coefficients
are often called "beta weights" or simply "betas" in some books and are
routinely calculated and reported in SPSS.

Agresti and Finlay (p.416) illustrate standardization in a model in
which the subject's "life events" and "socio-economic status" have been
used to predict "mental impairment". The respective coefficients are .103
and -.097, indicating that "there is a .1-unit increase in the estimated
mean of mental impairment for every 1-unit increase in the life events
score, controlling for SES" (p. 392) compared to a decrease of .097 in
estimated mean mental impairment when SES increases by one point and life
events are held constant. These two "effects" are hard to compare since
the two predictors have entirely different units of measurement. After
standardizing, the regression coefficients are .43 and -.45, respectively,
and A&F conclude that the two coefficients have similar magnitudes:
a "standard deviation increase in X_{2} , controlling for X_{1}
" has about the same effect on mental impairment as "a standard deviation
increase in X_{1} , controlling for X_{2}" , but in the
opposite direction.

The **attenuation problem** also arises in this context, unless the
data being used are a simple random sample from the population. If stratified
sampling has been used, or if the data are from a designed experiment,
**the
standard deviations of the predictors may not be unbiased estimates of
their population analogs**. While the unstandardized regression coefficients
will usually be good estimates of the population model parameters, the
standardized coefficients will not be generalizable and thus are difficult
to interpret.

**Kim & Ferree **argued forcefully that routine use of standardized
coefficients to solve the problem of comparing apples and oranges is not
justifiable, and that it is possible to evaluate relative importance of
predictors only when some legitimate common unit of measurement is available
for all predictors. Agresti and Finlay (p. 419) warn against using standardized
coefficients when comparing the results of the same regression analysis
on different groups. **Hubert Blalock**, of course, had made the same
points many years before (see Chapter 8 of his 1971 reader ** Causal
Models in the Social Sciences, **which reproduces his 1967 article).

Despite these warnings, social and behavioral science applications of regression analysis in the period 1960 - 1990 were very likely to use standardized variables. My opinion is that it is only in the last decade that the tide has turned toward analysis that emphasizes measured units and de-emphasizes the goal of comparative effect evaluation.

These issues apply to single-equation regression models, but become
even more involved when a multiple equation causal model is being studied.
Early converts to Sewall Wright's ** path analysis methodology**
saw as their goal the decomposition of X/Y correlations into direct effects,
indirect effects, and effects due to common causes. The Pearson correlations
among the variables served as the raw data for such analyses and the path
coefficients used in the decomposition of effects were standardized regression
coefficients.

To summarize, **correlations (whether r or R) can be considered as
characteristics of a population as well as descriptions of a sample. Non-random
samples will not necessarily provide good estimates of these correlations.**
Under such circumstances standardized regression coefficients, R-squares,
and "path coefficients" computed from the sample data in routine ways may
not be good estimates of the population phenomena the researcher is seeking
to understand. The aforementioned reviewer of Marsden's reader, noting
that some of the articles in the book used data from designed experiments
or non-simple random samples, pointed out that:

In his article on standardized coefficients J. Bring (*The American
Statistician, August 1994, pp. 209-213) *points out that the formula
relating the square of the** t value** for predictor

Here stands for the *R*
with predictor i removed from the equation. This quantity is also the F
statistic for testing whether the full model is a significant improvement
on the simpler model. *(See Agresti and Finlay, p.404.*)

The increment in R-squared is also related to another widely used measure,
the partial correlation coefficient between Y and the i^{th} predictor,
controlling for the other variables in the model. The neatest expression
I know for the square of this partial correlation is:

This can be interpreted as the proportion of the remaining unexplained
variance that is accounted for by adding predictor i to an existing model.

If you have comments or questions, email me at