Basic and Clinical Biostatistics > Chapter 3. Summarizing Data & Presenting Data in Tables & Graphs >

Key Concepts

All observations of subjects in a study are evaluated
on a scale of measurement that determines how the observations should
be summarized, displayed, and analyzed.

Nominal scales are used to categorize discrete characteristics.

Ordinal scales categorize characteristics that have an inherent
order.

Numerical scales measure the amount or quantity of something.

Means measure the middle of the distribution of a numerical
characteristic.

Medians measure the middle of the distribution of an ordinal
characteristic or a numerical characteristic that is skewed.

The standard deviation is a measure of the spread of observations
around the mean and is used in many statistical procedures.

The coefficient of variation is a measure of relative spread
that permits the comparison of observations measured on different
scales.

Percentiles are useful to compare an individual observation
with a norm.

Stem-and-leaf plots are a combination of frequency tables
and histograms that are useful in exploring the distribution of
a set of observations.

Frequency tables show the number of observations having a
specific characteristic.

Histograms, box plots, and frequency polygons display distributions
of numerical observations.

Proportions and percentages are used to summarize nominal
and ordinal data.

Rates describe the number of events that occur in a given
period.

Prevalence and incidence are two important measures of morbidity.

Rates must be adjusted when populations being compared differ
in an important confounding factor.

The relationship between two numerical characteristics is
described by the correlation.

The relationship between two nominal characteristics is described
by the risk ratio, odds ratio, and event rates.

Number needed to treat is a useful indication of the effectiveness
of a given therapy or procedure.

Scatterplots illustrate the relationship between two numerical
characteristics.

Poorly designed graphs and tables mislead in the information
they provide.

Computer programs are essential in today's research
environment, and skills to use and interpret them can be very useful.

Presenting Problems

Presenting Problem
1

Pulmonary embolism (PE) is a leading cause of morbidity and mortality.
Clinical features are nonspecific and a certain diagnosis is often
difficult to make. Attempts to simplify and improve the diagnostic
process in evaluating patients for possible PE have been made by
the introduction of two components: determination of pretest probability
and D-dimer testing. Pretest probability
is determined by developing explicit criteria for determining the
clinical probability of PE. D-dimer
assays measure the formation of D-dimer
when cross-linked fibrin in thrombi is broken down by plasmin. Elevated levels of D-dimer
can be used to detect deep venous thrombosis (DVT) and PE. Some D-dimer tests are very sensitive for
DVT and a normal result can be used to exclude venous thromboembolism.

Kline and colleagues (2002) wished to develop a set of clinical
criteria that would define a subgroup of patients with a pretest
probability of PE of greater than 40% (high-risk group).
These patients would be at too great a risk of experiencing a PE
to have the diagnosis excluded on the basis of D-dimer
testing. However, patients with a lower pretest probability (low-risk
group), in whom a normal result can help rule out the diagnosis
of PE, might be suitable candidates for D-dimer
testing. Data were available for 934 patients with suspected PE
at seven urban emergency departments (ED) in the United States.
The investigators measured a number of potential risk factors for
PE, and we look at some basic attributes of the observations in
this chapter. A random sample of shock index observations on 18
patients is given in the section titled, "Calculating Measures
of Central Tendency" and the entire data set is in a folder
on the CD-ROM [available only with the book] entitled "Kline."

Presenting Problem
2

The aging of the baby-boomers is leading to important demographic
changes in the population, with significant implications for health
care planners. Over the next 30 years in the United States, the
proportion of people over the age of 75 years is expected to increase
greatly. With the aging of the population, functional decline resulting
in disability and morbidity is a major challenge to health care
systems.

Hébert and coworkers (1997) designed a study to measure
disability and functional changes over a 2-year period in a community-dwelling
population age 75 years and older. A nurse interviewed 655 residents
in their homes in Quebec, Canada. The Functional Autonomy Measurement
System (SMAF), a 29-item rating scale measuring functional disability
in five areas, was administered together with a questionnaire measuring
health, cognitive function, and depression. Each individual was
interviewed again 1 and 2 years later by the same nurse. The SMAF
scale rates each item on a 4-point scale, where 0 is independent
and 3 is dependent. Functional decline was defined by an increase
of 5 points or more on the questionnaire, and improvement as a change
within ± 4 points. (The final analysis included
572 subjects, 504 of whom completed both follow-up interviews and
68 of whom died during the study.)

The authors wanted to summarize the data and estimate declines
in functional status. They also wanted to examine the relationship
between changes in scores over the two 1-year periods. Data are
given in the section titled "Displaying Numerical Data
in Tables & Graphs" and on the CD-ROM [available only with the book] in a folder entitled "Hébert."

Presenting Problem
3

A large study in a primary care clinic found that one in 20 women
had experienced domestic violence (DV) in the previous year and
one in four had experienced it sometime during her adult life. Children
from violent homes often suffer behavioral, emotional, and physical
health consequences either because they themselves are abused or
because they have witnessed the abuse of their mother. Pediatricians
have a unique opportunity to screen mothers because DV-battered women
may be more likely to obtain medical care for their children than
for themselves.

Lapidus and colleagues (2002) at the Connecticut Children's
Medical Center conducted a survey to assess DV education and training
and the use of DV screening among a representative statewide sample
of pediatricians and family physicians. They mailed self-administered
surveys to 903 physicians identified as active members of the American
Academy of Pediatrics and the American Academy of Family Physicians.
The survey requested information on physician demographics and issues
relating to DV. Domestic violence was defined as "past
or current physical, sexual, emotional, or verbal harm to a woman
caused by a spouse, partner, or family member." Overall, 49% of
the physicians responded to the survey after a total of three mailings.
The authors looked at the distribution of responses and calculated
some measures of predictive factors. We will revisit this study
in the chapter on survey research. In this chapter, we illustrate
frequency tables and odds ratios calculated by these investigators.
Data are on the CD-ROM [available only with the book] in a folder entitled "Lapidus."

Presenting Problem
4

Factor VIII is one of the procoagulants of the intrinsic pathway
of coagulation. Hemophilia A, a disease affecting about 1 in 10,000
males, is a hereditary hemorrhagic disorder characterized by deficient
or defective factor VIII. Acquired hemophilia is a much rarer hemorrhagic
disorder af fecting 1 person per million each year and char acterized
by spontaneous development of an autoantibody directed against factor
VIII. Patients often present with ecchymosis, hematomas, hematuria,
or compressive neuropathy. The hemorrhagic complications are fatal
in 14–22% of patients. Underlying diseases, including
autoimmune diseases and malignancies, are often associated with
acquired hemophilia.

Optimal treatment is not yet established and, because the disease
is so rare, no randomized controlled trials of treatment have been
undertaken. A retrospective study of 34 patients with acquired hemophilia
due to factor VIII inhibitors was conducted along with an extensive
literature review to clarify the clinical characteristics of this
disease and plan a prospective study of optimal treatment (Bossi
et al, 1998). Information from the study is given in the section
titled "Tables and Graphs for Nominal and Ordinal Data." The
investigators want to summarize data on some risk factors for men
and women separately.

Presenting Problem
5

Premature birth, especially after fewer than 32 weeks of gestation,
is associated with a high incidence of respiratory distress syndrome
and a form of chronic lung disease known as bronchopulmonary dysplasia.
Lung disease is the principal cause of morbidity and mortality in
premature infants.

Thyroid hormones stimulate fetal lung development in animals.
Little thyroid hormone is transferred from mother to fetus, but
thyrotropin-releasing hormone (TRH) given to the mother increases
fetal serum concentrations of thyroid hormone. Several studies have
shown that the antenatal administration of TRH reduces the incidence
and severity of respiratory distress syndrome, chronic lung disease,
and death in these high-risk infants. Two other studies showed no benefit
from treatment with TRH.

Ballard and coinvestigators (1998) wanted to reassess the efficacy
and safety of antenatal administration of TRH in improving pulmonary
outcome in preterm infants. Most of the earlier studies were relatively
small, and one had not been blinded. Also, changes in neonatal care
implemented in the past decade, particularly the use of surfactant,
improved the chances of survival of premature infants.

The study enrolled 996 women in active labor with gestations
of at least 24 but fewer than 30 weeks into a randomized, double-blind,
placebo-controlled trial of antenatalTRH. The women receiving active
treatment were given four doses of 400 g of TRH
intravenously at 8-h intervals. Those receiving placebo were given
normal saline. Both groups received glucocorticoids, and surfactant
was given to the infants when clinically indicated. There were 1134
live births (844 single and 290 multiple) and 11 stillbirths.

Infants born at 32 or fewer weeks gestation constituted the group
at risk for lung disease; those born at 33 weeks or later were not
at risk for lung disease. Outcomes included infant death on or before
the 28th day after delivery; chronic lung disease, defined as the
need for oxygen therapy for 21 of the first 28 days of life; and
the development of respiratory stress syndrome, defined as the need
for oxygen and either assisted ventilation or radiologic findings.
The authors wanted to find the risk of developing these outcomes
in the TRH group compared with the placebo group. Selected results
from the study are given in the section titled "Number
Needed to Treat."

Purpose of the
Chapter

This chapter introduces different kinds of data collected in
medical research and demonstrates how to organize and present summaries
of the data. Regardless of the particular research being done, investigators
collect observations and generally want to transform them into tables
or graphs or to present summary numbers, such as percentages or
means. From a statistical perspective, it does not matter whether
the observations are on people, animals, inanimate objects, or events.
What matters is the kind of observations and the scale on which
they are measured. These features determine the statistics used
to summarize the data, called descriptive
statistics, and the types of tables or graphs that best display
and communicate the observations.

We use the data from the presenting problems to illustrate the
steps involved in calculating the statistics because we believe
that seeing the steps helps most people understand procedures. As
we emphasize throughout this book, however, we expect that most
people will use a computer to analyze data. In fact, this and following
chapters contain numerous illustrations from some commonly used
statistical computer programs, including NCSS contained on the CD-ROM [available only with the book] .

Scales of Measurement

The scale for measuring a characteristic has implications for
the way information is displayed and summarized. As we will see
in later chapters, the scale of measurement—the
precision with which a characteristic is measured—also
determines the statistical methods for analyzing the data. The three
scales of measurement that occur most often in medicine are nominal,
ordinal, and numerical.

Nominal Scales

Nominal scales are used for the
simplest level of measurement when data values fit into categories.
For example, in Presenting Problem 5
Ballard and colleagues (1998)
use the following nominal characteristic to describe the outcome
in infants being treated with antenatal TRH: the development of
respiratory distress syndrome. In this example, the observations
are dichotomous or binary in that the outcome can take
on only one of two values: yes or no. Although we talk about nominal
data as being on the measurement scale, we do not actually measure
nominal data; instead, we count the number of observations with
or without the attribute of interest.

Many classifications in medical research are evaluated on a nominal
scale. Outcomes of a medical treatment or surgical procedure, as
well as the presence of possible risk factors, are often described
as either occurring or not occurring. Outcomes may also be described
with more than two categories, such as the classification of anemias
as microcytic (including iron deficiency), macrocytic or megaloblastic
(including vitamin B12
deficiency), and normocytic (often
associated with chronic disease).

Data evaluated on a nominal scale are sometimes called qualitative observations, because
they describe a quality of the person or thing studied, or categorical observations, because
the values fit into categories. Nominal or qualitative data are
generally described in terms of percentages or proportions, such as the fact that
38% of the patients in the study of patients with acquired
hemophilia (Bossi et al, 1998) developed hematuria. Contingency tables and bar charts are most often used to
display this type of information and are presented in the section
titled "Tables and Graphs for Nominal and Ordinal Data."

Ordinal Scales

When an inherent order occurs among the categories, the observations
are said to be measured on an ordinal scale. Observations
are still classified, as with nominal scales, but some observations
have more or are greater
than other observations. Clinicians often use ordinal scales
to determine a patient's amount of risk or the appropriate
type of therapy. Tumors, for example, are staged according to their
degree of development. The international classification for staging
of carcinoma of the cervix is an ordinal scale from 0 to 4, in which
stage 0 represents carcinoma in situ and stage 4 represents carcinoma extending
beyond the pelvis or involving the mucosa of the bladder and rectum.
The inherent order in this ordinal scale is, of course, that the
prognosis for stage 4 is worse than that for stage 0.

Classifications based on the extent of disease are sometimes
related to a patient's activity level. For example, rheumatoid
arthritis is classified, according to the severity of disease, into
four classes ranging from normal activity (class 1) to wheelchair-bound
(class 4). Using the Functional Autonomy Measurement System developed
by the World Health Organization, Hébert and coinvestigators
(1997) studied the functional activity of elderly people who live
in a community. Although order exists among categories in ordinal
scales, the difference between two adjacent categories is not the
same throughout the scale. To illustrate, Apgar scores, which describe
the maturity of newborn infants, range from 0 to 10, with lower
scores indicating depression of cardiorespiratory and neurologic
functioning and higher scores indicating good functioning. The difference
between scores of 8 and 9 probably does not have the same clinical
implications as the difference between scores of 0 and 1.

Some scales consist of scores for multiple factors that are then
added to get an overall index. An index frequently used to estimate
the cardiac risk in noncardiac surgical procedures was developed
by Goldman and his colleagues (1977, 1995). This index assigns points
to a variety of risk factors, such as age over 70 years, history
of an MI in the past 6 months, specific electrocardiogram abnormalities,
and general physical status. The points are added to get an overall
score from 0 to 53, which is used to indicate the risk of complications
or death for different score levels.

A special type of ordered scale is a rank-order
scale, in which observations are ranked from highest to lowest
(or vice versa). For example, health providers could direct their
education efforts aimed at the obstetric patient based on ranking
the causes of low birthweight in infants, such as malnutrition,
drug abuse, and inadequate prenatal care, from most common to least
common. The duration of surgical procedures might be converted to
a rank scale to obtain one measure of the difficulty of the procedure.

As with nominal scales, percentages and proportions are often
used with ordinal scales. The entire set of data measured on an
ordinal scale may be summarized by the median value,
and we will describe how to find the median and what it means. Ordinal
scales having a large number of values are sometimes treated as
if they are numerical (see following section). The same types of
tables and graphs used to display nominal data may also be used
with ordinal data.

Numerical Scales

Observations for which the differences between numbers have meaning
on a numerical scale are sometimes called quantitative
observations because they measure the quantity of something.
There are two types of numerical scales: continuous^{a} (interval)
and discrete scales. A continuous scale has
values on a continuum (eg, age); a discrete
scale has values equal to integers (eg, number of fractures).

^{a}Some statisticians differentiate interval scales (with
an arbitrary zero point) from ratio scales (with an absolute zero
point); examples are temperature on a Celsius scale (interval) and
temperature on a Kelvin scale (ratio). Little difference exists,
however, in how measures on these two scales are treated statistically,
so we call them both simply numerical.

If data need not be very precise, continuous data may be reported
to the closest integer. Theoretically, however, more precise measurement
is possible. Age is a continuous measure, and age recorded to the
nearest year will generally suffice in studies of adults; however,
for young children, age to the nearest month may be preferable.
Other examples of continuous data include height, weight, length
of time of survival, range of joint motion, and many laboratory
values.

When a numerical observation can take on only integer values,
the scale of measurement is discrete. For example, counts of things—number
of pregnancies, number of previous operations, number of risk factors—are
discrete measures.

In the study by Kline and colleagues (2002), several patient
characteristics were evaluated, including shock index and presence
of PE. The first characteristic is measured on a continuous numerical
scale because it can take on any individual value in the possible
range of values. Presence of PE has a nominal scale with only two
values: presence or absence. In the study by Ballard and coworkers
(1998), the number of infants who developed respiratory distress
syndrome is an example of a discrete numerical scale.

Characteristics measured on a numerical scale are frequently
displayed in a variety of tables and graphs. Means and standard
deviations are generally used to summarize the values of numerical measures.
We next examine ways to summarize and display numerical data and
then return to the subject of ordinal and nominal data.

Summarizing
Numerical Data with Numbers

When an investigator collects many observations, such as shock
index or blood pressure in the study by Kline and colleagues (2002),
numbers that summarize the data can communicate a lot of information.

Measures of
the Middle

One of the most useful summary numbers is an indicator of the
center of a distribution of observations—the middle or
average value. The three measures of central tendency used in medicine
and epidemiology are the mean, the median, and, to a lesser extent,
the mode. All three are used for numerical data, and the median
is used for ordinal data as well.

Calculating
Measures of Central Tendency

The Mean

Although several means may be mathematically calculated, the
arithmetic, or simple, mean is used most frequently in statistics
and is the one generally referred to by the term "mean." The mean is the arithmetic average of
the observations. It is symbolized by (called X-bar) and is calculated as follows:
add the observations to obtain the sum and then divide by the number
of observations.

The formula for the mean is written X / n, where (Greek
letter sigma) means to add, X represents
the individual observations, and n is
the number of observations.

Table 3–1 gives the value of the shock index, systolic
blood pressure, and heart rate for 18 randomly selected patients
in the D-dimer study (Kline et al, 2002).
(We will learn about random sampling in Chapter 4.) The mean shock
index for these 18 patients is

Table 3–1. Shock
Index for a Random Sample of 18 Patients.

Table 3–1. Shock
Index for a Random Sample of 18 Patients.

Subject ID

Shock Index

Systolic Blood Pressure

Heart Rate

1

0.61

139

85

2

0.56

151

84

3

0.52

201

104

4

0.33

170

56

5

0.45

123

55

6

0.74

121

90

7

0.73

119

87

8

0.92

100

92

9

0.42

164

69

10

0.63

161

102

11

0.55

164

12

0.50

138

69

13

0.75

118

89

14

0.82

130

106

15

1.30

109

142

16

1.29

92

119

17

0.85

126

107

18

0.44

139

61

Source: Data, used with
permission of the authors and publisher, Kline JA, Nelson RD, Jackson
RE, Courtney DM: Criteria for the safe use of D-dimer
testing in emergency department patients with suspected pulmonary
embolism: A multicenter US study. Ann Emergency Med 2002;39:144–152.
Table produced with NCSS; used with permission.

The mean is used when the numbers can be added (ie, when the
characteristics are measured on a numerical scale); it should not
ordinarily be used with ordinal data because of the arbitrary nature of
an ordinal scale. The mean is sensitive to extreme values in a set
of observations, especially when the sample size is fairly small.
For example, the values of 1.30 for subject 15 and is relatively
large compared with the others. If this value was not present, the
mean would be 0.612 instead of 0.689.

If the original observations are not available, the mean can
be estimated from a frequency table. A weighted
average is formed by multiplying each data value by the number
of observations that have that value, adding the products, and dividing
the sum by the number of observations. We have formed a frequency
table of shock index observations in Table 3–2, and we
can use it to estimate the mean shock index for all patients in
the study. The weighted-average estimate of the mean, using the number
of subjects and the midpoints in each interval, is

Table 3–2. Frequency
Distribution of Shock Index in 10-Point Intervals.

Table 3–2. Frequency
Distribution of Shock Index in 10-Point Intervals.

Shock Index

Count

Cumulative Count

Percent

Cumulative Percent

Graph of Percent

0.40 or less

38

38

4.08

4.08

/

0.40 up to 0.50

104

142

11.16

15.24

////

0.50 up to 0.60

198

340

21.24

36.48

////////

0.60 up to 0.70

199

539

21.35

57.83

////////

0.70 up to 0.80

155

694

16.63

74.46

//////

0.80 up to 0.90

102

796

10.94

85.41

////

0.90 up to 1.00

60

856

6.44

91.85

//

1.00 up to 1.10

37

893

3.97

95.82

/

1.10 up to 1.20

19

912

2.04

97.85

/

1.20 or higher

19

932

2.15

100.00

/

Source: Data, used with
permission of the authors and publisher, Kline JA, Nelson RD, Jackson
RE, Courtney DM: Criteria for the safe use of D-dimer
testing in emergency department patients with suspected pulmonary
embolism: A multicenter US study. Ann Emergency Med 2002;39:144–152.
Table produced with NCSS; used with permission.

The value of the mean calculated from a frequency table is not
always the same as the value obtained with raw numbers. In this
example, the shock index means calculated from the raw numbers and
the frequency table are very close. Investigators who calculate
the mean for presentation in a paper or talk have the original observations,
of course, and should use the exact formula. The formula for use
with a frequency table is helpful when we as readers of an article
do not have access to the raw data but want an estimate of the mean.

The Median

The median is the middle observation,
that is, the point at which half the observations are smaller and
half are larger. The median is sometimes symbolized by M or Md, but
it has no conventional symbol. The procedure for calculating the
median is as follows:

1. Arrange the observations from smallest to largest (or
vice versa).

2. Count in to find the middle value. The median is the middle
value for an odd number of observations; it is defined as the mean
of the two middle values for an even number of observations.

For example, in rank order (from lowest to highest), the shock
index values in Table 3–1 are as follows:

For 18 observations, the median is the mean of the ninth and
tenth values (0.61 and 0.63), or 0.62. The median tells us that
half the shock index values in this group are less than 0.62 and
half are greater than 0.62. We will learn later in this chapter
that the median is easy to determine from a stem-and-leaf
plot of the observations.

The median is less sensitive to extreme values than is the mean.
For example, if the largest observation, 1.30, is excluded from
the sample, the median would be the middle value, 0.61. The median
is also used with ordinal observations.

The Mode

The mode is the value that occurs
most frequently. It is commonly used for a large number of observations when
the researcher wants to designate the value that occurs most often.
No single observation occurs most frequently among the data in Table
3–1. When a set of data has two modes, it is called bimodal. For frequency tables or a
small number of observations, the mode is sometimes estimated by
the modal class, which is the interval
having the largest number of observations. For the shock index data
in Table 3–2, the modal class is 0.60 thru 0.69 with 199
patients.

The Geometric
Mean

Another measure of central tendency not used as often as the
arithmetic mean or the median is the geometric
mean, sometimes symbolized as GM or G. It is the nth
root of the product of the n observations.
In symbolic form, for n observations X_{1}, X_{2}, X_{3}, . . . , X_{n}, the
geometric mean is

The
geometric mean is generally used with data measured on a
logarithmic scale, such as the dilution of the smallpox vaccine
studied by Frey and colleagues (2002), a presenting problem in Chapter
5. Taking the logarithm of both sides of the preceding equation,
we see that the logarithm of the geometric mean is equal to the
mean of the logarithms of the observations.

Use the CD-ROM [available only with the book] and find the mean, median, and mode for the shock
index for all of the patients in the study by Kline and colleagues
(2002). Repeat for patients who did and did not have a PE. Do you
think the mean shock index is different for these two groups? In
Chapter 6 we will learn how to answer this type of question.

Using Measures
of Central Tendency

Which measure of central tendency is best with a particular set
of observations? Two factors are important: the scale of measurement
(ordinal or numerical) and the shape of the distribution of observations.
Although distributions are discussed in more detail in Chapter 4,
we consider here the notion of whether a distribution is symmetric
about the mean or is skewed to the left or the right.

If outlying observations occur in only one direction—either
a few small values or a few large ones—the distribution
is said to be a skewed distribution. If
the outlying values are small, the distribution is skewed to the
left, or negatively skewed; if the outlying values are large, the
distribution is skewed to the right, or positively skewed. A symmetric distribution has the same
shape on both sides of the mean. Figure 3–1 gives examples
of negatively skewed, positively skewed, and symmetric distributions.

Figure 3-1.

Shapes of common distributions of observations. A:
Negatively skewed. B: Positively skewed. C and D: Symmetric.

The following facts help us as readers of articles know the shape
of a distribution without actually seeing it.

1. If the mean and the median are equal, the distri bution
of observations is symmetric, generally as in Figures 3–1C and 3–1D.

2. If the mean is larger than the median, the distribution
is skewed to the right, as in Figure 3–1B.

3. If the mean is smaller than the median, the distribution
is skewed to the left, as in Figure 3–1A.

In a study of the increase in educational debt among Canadian
medical students, Kwong and colleagues (2002) reported the median
level of debt for graduating students. The investigators reported
the median rather than the mean because a relatively small number
of students had extremely high debts, which would cause the mean
to be an overestimate. The following guidelines help us decide which
measure of central tendency is best.

1. The mean is used for numerical data and for symmetric
(not skewed) distributions.

2. The median is used for ordinal data or for numerical data
if the distribution is skewed.

3. The mode is used primarily for bimodal distributions.

4. The geometric mean is generally used for observations measured
on a logarithmic scale.

Measures of
Spread

Suppose all you know about the 18 randomly selected patients
in Presenting Problem 1 is that the mean shock index is 0.69. Although
the mean provides useful information, you have a better idea of
the distribution of shock indices in these patients if you know
something about the spread, or the variation, of the observations.
Several statistics are used to describe the dispersion of data: range,
standard deviation, coefficient of variation, percentile rank, and
interquartile range. All are described in the following sections.

Calculating
Measures of Spread

The Range

The range is the difference between
the largest and the smallest observation. It is easy to determine
once the data have been arranged in rank order. For example, the
lowest shock index among the 18 patients is 0.33, and the highest
is 1.30; thus, the range is 1.30 minus 0.33, or 0.97. Many authors
give minimum and maximum values instead of the range, and in some
ways these values are more useful.

The Standard
Deviation

The standard deviation is the most commonly used measure of dispersion
with medical and health data. Although its meaning and computation
are somewhat complex, it is very important because it is used both
to describe how observations cluster around the mean and in many
statistical tests. Most of you will use a computer to determine
the standard deviation, but we present the steps involved in its
calculation to give a greater understanding of the meaning of this
statistic.

The standard deviation is a measure
of the spread of data about their mean. Briefly looking at the logic
behind this statistic, we need a measure of the "average" spread
of the observations about the mean. Why not find the deviation of
each observation from the mean, add these deviations, and divide
the sum by n to form an analogy to
the mean itself? The problem is that the sum of deviations about
the mean is always zero (see Exercise 1). Why not use the absolute
values of the deviations? The absolute
value of a number ignores the sign of the number and is denoted
by vertical bars on each side of the number. For example, the absolute
value of 5, |5|,
is 5, and the absolute value of –5, |-5|,
is also 5. Although this approach avoids the zero sum problem, it
lacks some important statistical properties, and so is not used.
Instead, the deviations are squared before
adding them, and then the square root is found to express the standard
deviation on the original scale of measurement. The standard deviation
is symbolized as SD, sd, or simply s (in this text we use SD), and its formula is

The name of the statistic before the square root is taken is
the variance, but the standard deviation
is the statistic of primary interest.

Using n – 1 instead of n in the denominator produces a more
accurate estimate of the true population standard deviation and
has desirable mathematical properties for statistical inferences.

The preceding formula for standard deviation, called the definitional formula, is not the easiest
one for calculations. Another formula, the computational
formula, is generally used instead. Because we generally compute
the standard deviation using a computer, the illustrations in this
text use the more meaningful but computationally less efficient
formula. If you are curious, the computational formula is given
in Exercise 7.

Now let's try a calculation. The shock index values
for the 18 patients are repeated in Table 3–3 along with
the computations needed. The steps follow:

Table 3–3. Calculations
for Standard Deviation of Shock Index in a Random Sample of 18 Patients.

Table 3–3. Calculations
for Standard Deviation of Shock Index in a Random Sample of 18 Patients.

Patient

X

X –

(X –)^{2}

1

0.61

–0.08

0.01

2

0.56

–0.13

0.02

3

0.52

–0.17

0.03

4

0.33

–0.36

0.13

5

0.45

–0.24

0.06

6

0.74

0.05

0.00

7

0.73

0.04

0.00

8

0.92

0.23

0.05

9

0.42

–0.27

0.07

10

0.63

–0.06

0.00

11

0.55

–0.14

0.02

12

0.50

–0.19

0.04

13

0.75

0.06

0.00

14

0.82

0.13

0.02

15

1.30

0.61

0.38

16

1.29

0.60

0.36

17

0.85

0.16

0.03

18

0.44

–0.25

0.06

Sums

12.41

1.28

Mean

0.69

Source: Data, used with
permission of the authors and publisher, Kline JA, Nelson RD, Jackson
RE, Courtney DM: Criteria for the safe use of D-dimer
testing in emergency department patients with suspected pulmonary
embolism: A multicenter US study. Ann Emergency Med 2002;39:144–152.
Table produced with Microsoft Excel.

1. Let X be the shock index
for each patient, and find the mean: the mean is 0.69, as we calculated
earlier.

2. Subtract the mean from each observation to form the deviations X – mean.

3. Square each deviation to form (X – mean)^{2}.

4. Add the squared deviations.

5. Divide the result in step 4 by n – 1;
we have 0.071. This value is the variance.

6. Take the square root of the value in step 5 to find the
standard deviation; we have 0.267 or 0.27. (The actual value is
0.275 or 0.28; our result is due to round-off error.)

But note the relatively large squared deviation of 0.38 for patient
15 in Table 3–3. It contributes substantially to the variation
in the data. The standard deviation of the remaining 17 patients
(after eliminating patient 15) is smaller, 0.235, demonstrating
the effect that outlying observations can have on the value of the
standard deviation.

The standard deviation, like the mean, requires numerical data.
Also, like the mean, the standard deviation is a very important
statistic. First, it is an essential part of many statistical tests as we will see in
later chapters. Second, the standard deviation is very useful in
describing the spread of the observations about the mean value.
Two rules of thumb when using the standard deviation are:

1. Regardless of how the observations are distributed,
at least 75% of the values always lie
between these two numbers: the mean minus
2 standard deviations and the mean
plus 2 standard deviations. In the shock index example, the
mean is 0.69 and the standard deviation is 0.28; therefore, at least 75% lie
between 0.69 ± 2(0.28), or between 0.13 and 1.25. In this example,
16 of the 18 observations, or 89%, fall between these limits.

2. If the distribution of observations is bell-shaped, then
even more can be said about the percentage of observations that
lay between the mean and ± 2 standard deviations. For a bell-shaped
distribution, approximately:

67% of the observations lie between the mean ± 1
standard deviation

95% of the observations lie between the mean ± 2
standard deviations

99.7% of the observations lie between the mean ± 3
standard deviations

The standard deviation, along with the mean, can be helpful in
determining skewness when only summary statistics are given: if
the mean minus 2 SD contains zero (ie, the mean is smaller than
2 SD), the observations are probably skewed.

Use the CD-ROM [available only with the book], and find the range and standard deviation of
shock index for all of the patients in the Kline and colleagues
study (2002). Repeat for patients with and without a PE. Are the
distributions of shock index similar in these two groups of patients?

The Coefficient
of Variation

The coefficient of variation (CV)
is a useful measure of relative spread
in data and is used frequently in the biologic sciences. For example,
suppose Kline and his colleagues (2002) wanted to compare the variability
in shock index with the variability in systolic blood pressure (BP)
in the patients in their study. The mean and the standard deviation
of shock index in the total sample are 0.69 and 0.20, respectively;
for systolic BP, they are 138 and 0.26, respectively. A comparison
of the standard deviations makes no sense because shock index and systolic
BP are measured on much different scales. The coefficient of variation
adjusts the scales so that a sensible comparison can be made.

The coefficient of variation is defined as the standard deviation
divided by the mean times 100%. It produces a measure of
relative variation—variation that is relative to the size
of the mean. The formula for the coefficient
of variation is

From this formula, the CV for shock
index is (0.20/0.69)(100%) = 29.0%,
and the CV for systolic BP is (26/138)(100%) = 18.8%.
We can therefore conclude that the relative variation
in shock index is considerably greater than the variation in systolic
BP. A frequent application of the coefficient of variation in the
health field is in laboratory testing and quality control procedures.

Use the CD-ROM [available only with the book] and find the coefficient of variation for shock
index for patients who did and did not have a PE in the Kline and
colleagues study.

Percentiles

A percentile is the percentage
of a distribution that is equal to or below a particular number.
For example, consider the standard physical growth chart for girls
from birth to 36 months old given in Figure 3–2. For girls
21 months of age, the 95th percentile of weight is 12 kg, as noted
by the arrow in the chart. This percentile means that among 21-month-old
girls, 95% weigh 12 kg or less and only 5% weigh
more than 12 kg. The 50th percentile is, of course, the same value
as the median; for 21-month-old girls, the median or 50th percentile
weight is approximately 10.6 kg.

Figure 3-2.

Standard physical growth chart. (Reproduced, with permission,
from Ross Laboratories.)

Percentiles are often used to compare an individual value with
a norm. They are extensively used to develop and interpret physical
growth charts and measurements of ability and intelligence. They
also determine normal ranges of laboratory values; the "normal
limits" of many laboratory values are set by the 2½ and
97½ percentiles, so that the normal limits contain the central
95% of the distribution. This approach was taken in a study
by Gelber and colleagues (1997) when they developed norms for mean
heart variation to breathing and Valsalva ratio (see Exercise 2).

Interquartile
Range

A measure of variation that makes use of percentiles is the interquartile range, defined as the
difference between the 25th and 75th percentiles, also called the first and third
quartiles, respectively. The interquartile range contains the
central 50% of observations. For example, the interquartile
range of weights of girls who are 9 months of age (see Figure 3–2)
is the difference between 7.5 kg (the 75th percentile) and 6.5 kg
(the 25th percentile); that is, 50% of infant girls weigh
between 6.5 and 7.5 kg at 9 months of age.

Using Different
Measures of Dispersion

The following guidelines are useful in deciding which measure
of dispersion is most appropriate for a given set of data.

1. The standard deviation is used when the mean is used
(ie, with symmetric numerical data).

2. Percentiles and the interquartile range are used in two
situations:

a. When the median is used (ie, with ordinal data or with
skewed numerical data).

b. When the mean is used but the objective is to compare individual
observations with a set of norms.

3. The interquartile range is used to describe the central
50% of a distribution, regardless of its shape.

4. The range is used with numerical data when the purpose
is to emphasize extreme values.

5. The coefficient of variation is used when the intent is
to compare distributions measured on different scales.

Displaying Numerical
Data in Tables & Graphs

We all know the saying, "A picture is worth 1000 words," and
researchers in the health field certainly make frequent use of graphic
and pictorial displays of data. Numerical data may be presented
in a variety of ways, and we will use the data from the study by
Hébert and colleagues (1997) on functional decline in the
elderly (Presenting Problem 2) to illustrate some of the more common
methods. The subjects in this study were 75 years of age or older.
We use a subset of their data, the 72 patients age 85 years or older
who completed the Functional Autonomy Measurement System (SMAF).
The total score on the SMAF for these subjects in year 1, year 3,
and the differences in score between year 3 and year 1 in are given
in Table 3–4.

Table 3–4. Difference
in Total Score on the Functional Autonomy Measurement System for
Patients Age 85 Years or Older. Positive Differences Indicate a
Decline.

Table 3–4. Difference
in Total Score on the Functional Autonomy Measurement System for
Patients Age 85 Years or Older. Positive Differences Indicate a
Decline.

Age

Sex

SMAF at Time 1

SMAF at Time 3

Difference (Time 3 – Time 1)

90

F

28

20

–8

88

F

8

11

3

88

F

6

9

3

90

F

22

18

–4

88

M

6

7

1

86

F

9

9

0

86

M

23

15

–8

85

F

12

40

28

88

F

9

30

21

86

F

5

15

10

95

F

20

16

–4

88

F

3

26

23

87

F

22

24

2

86

F

20

20

0

86

M

0

1

1

93

F

30

34

4

87

F

13

23

10

94

F

47

52

5

86

F

1

20

19

85

F

3

50

47

87

M

4

57

53

89

F

12

14

2

87

F

1

4

3

87

F

13

16

3

85

F

1

1

0

85

F

35

30

–5

88

F

22

19

–3

88

M

1

1

0

86

F

2

17

15

88

M

3

3

0

86

F

21

39

18

85

F

2

2

0

85

M

7

8

1

88

M

8

10

2

85

F

7

5

–2

89

F

11

20

9

87

F

1

0

–1

88

F

12

19

7

87

F

19

56

37

94

F

21

16

–5

86

M

17

26

9

85

F

27

21

–6

85

M

4

2

–2

85

F

9

5

–4

85

M

7

34

27

87

F

38

34

–4

85

F

13

22

9

85

F

4

4

0

85

F

17

27

10

90

F

23

27

4

86

M

12

13

1

88

M

30

29

–1

85

M

27

26

–1

87

F

26

47

21

86

M

44

46

2

85

F

21

23

2

86

M

17

57

40

88

M

10

19

9

85

F

15

22

7

86

F

4

6

2

88

F

10

12

2

88

M

18

22

4

87

M

12

20

8

85

M

37

47

10

85

F

17

14

–3

89

F

14

19

5

85

F

11

14

3

87

F

4

6

2

86

F

16

26

10

90

F

5

6

1

85

F

48

51

3

88

M

9

17

8

Source: Data, used with
permission of the author and the publisher, from Hébert
R, Brayne C, Spiegelhalter D: Incidence of functional decline and
improvement in a community-dwelling very elderly population. Am
J Epidemiol 1997;145:935–944.

Stem-and-Leaf Plots

Stem-and-leaf plots are graphs developed in 1977 by Tukey, a
statistician interested in meaningful ways to communicate by visual
display. They provide a convenient means of tallying the observations
and can be used as a direct display of data or as a preliminary
step in constructing a frequency table. The observations in Table
3–4 show that many of the differences in total scores are small,
but also that some people have large positive scores, indicating
large declines in function. The data are not easy to understand,
however, by simply looking at a list of the raw numbers. The first
step in organizing data for a stem-and-leaf plot is to decide on
the number of subdivisions, called classes or intervals (it should
generally be between 6 and 14; more details on this decision are
given in the following section). Initially, we categorize observations
by 5s, from –9 to –5, –4 to 0, 1 to 5,
6 to 10, 11 to 15, 16 to 20, and so on.

To form a stem-and-leaf plot, draw
a vertical line, and place the first digits of each class—called
the stem—on the left side of the line, as in Table 3–5.
The numbers on the right side of the vertical line represent the
second digit of each observation; they are the leaves. The steps
in building a stem-and-leaf plot are as follows:

Table 3–5. Constructing
a Stem-and-Leaf Plot of Change in Total Function Scores Using 5-Point
Categories: Observations for the First 10 Subjects.

Table 3–5. Constructing
a Stem-and-Leaf Plot of Change in Total Function Scores Using 5-Point
Categories: Observations for the First 10 Subjects.

Stem

Leaves

–9 to –5

8 8

–4 to 0

4 0

+1 to +5

3 3 1

+6 to +10

0

+11 to +15

+16 to +20

+21 to +25

1

+26 to +30

8

+31 to +35

+36 to +40

+41 to +45

+46 to +50

+51 to +55

+56 to +60

Source: Data, used with
permission of the authors and the publisher, from Hébert
R, Brayne C, Spiegelhalter D: Incidence of functional decline and
improvement in a community-dwelling, very elderly population. Am
J Epidemiol 1997;145:935–944.

1. Take the score of the first person, –8, and
write the second digit, 8, or leaf, on the right side
of the vertical line, opposite the first digit, or stem, corresponding
to –9 to –5.

2. For the second person, write the 3 (leaf) on the right
side of the vertical line opposite 1 to 5 (stem).

3. For the third person, write the 3 (leaf) opposite 1 to
5 (stem) next to the previous score of 3.

4. For the fourth person, write the –4 (leaf) opposite –4
to 0 (stem); and so on.

5. When the observation is only one digit, such as for subjects
1 through 7 in Table 3–4, that digit is the leaf.

6. When the observation is two digits, however, such as the
score of 28 for subject 8, only the second digit, or 8 in this case,
is written.

The leaves for the first ten people are given in Table 3–5.
The complete stem-and-leaf plot for the score changes of all the
subjects is given in Table 3–6. The plot both provides
a tally of observations and shows how the changes in scores are
distributed. The choice of class widths of 5 points is reasonable,
although we usually prefer to avoid having many empty classes at
the high end of the scale. It is generally preferred to have equal
class widths and to avoid open-ended intervals, such as 30 or higher,
although some might choose to combine the higher classes in the
final plot.

Table 3–6. Stem-and-Leaf
Plot of Change in Total Function Scores Using 5-Point Categories.

Table 3–6. Stem-and-Leaf
Plot of Change in Total Function Scores Using 5-Point Categories.

Stem

Leaves

–9 to –5

8 8 5 5 6

–4 to 0

4 0 4 0 0 3 0 0 0 2 1 2 4 4 0 1 1 3

+1 to +5

3 3 1 2 1 4 5 2 3 3 1 2 4 1 2 2 2 2 4 5 3 2 1 3

+6 to +10

0 0 9 7 9 9 0 9 7 8 0 8 0

+11 to +15

5

+16 to +20

9 8

+21 to +25

1 3 1

+26 to +30

8 7

+31 to +35

+36 to +40

7 0

+41 to +45

7

+46 to +50

+51 to +55

3

+56 to +60

Source: Data, used with
permission of the authors and the publisher, from Hébert
R, Brayne C, Spiegelhalter D: Incidence of functional decline and
improvement in a community-dwelling, very elderly population. Am
J Epidemiol 1997;145:935–944.

Usually the leaves are reordered from lowest to highest within
each class. After the reordering, it is easy to locate the median
of the distribution by simply counting in from either end.

Use the CD-ROM [available only with the book] and the routine for generating stem-and-leaf plots
with the data on shock index separately for patients who did and
did not have a PE in the Kline and colleagues study (1997).

Frequency Tables

Scientific journals often present information in frequency distributions
or frequency tables. The scale of the observations must first be
divided into classes, as in stem-and-leaf plots. The number of observations
in each class is then counted. The steps for constructing a frequency
table are as follows:

1. Identify the largest and smallest observations.

2. Subtract the smallest observation from the largest to obtain
the range.

3. Determine the number of classes. Common sense is usually adequate
for making this decision, but the following guidelines may be helpful.

a. Between 6 and 14 classes is generally adequate to provide
enough information without being overly detailed.

b. The number of classes should be large enough to demonstrate
the shape of the distribution but not so many that minor fluctuations
are noticeable.

4. One approach is to divide the range of observations by the
number of classes to obtain the width of the classes. For some applications,
deciding on the class width first may make more sense; then use
the class width to determine the number of classes. The following
are some guidelines for determining class width.

a. Class limits (beginning
and ending numbers) must not overlap. For example, they must be
stated as "40–49" or "40 up
to 50," not as "40–50" or "50–60." Otherwise,
we cannot tell the class to which an observation of 50 belongs.

b. If possible, class widths should be equal. Unequal class widths
present graphing problems and should be used only when large gaps
occur in the data.

c. If possible, open-ended classes at the upper or lower end
of the range should be avoided because they do not accurately communicate
the range of the observations. We used open-ended classes in Table
3–2 when we had the categories of 0.40 or less and 1.20
or higher.

d. If possible, class limits should be chosen so that most of
the observations in the class are closer to the midpoint of the
class than to either end of the class. Doing so results in a better
estimate of the raw data mean when the weighted mean is calculated
from a frequency table (see the section titled, "The
Mean" and Exercise 3).

5. Tally the number of observations in each class.
If you are constructing a stem-and-leaf plot, the actual value of
the observation is noted. If you are constructing a frequency table,
you need use only the number of observations that fall within the
class.

Computer programs generally list each value, along with its frequency.
Users of the programs must designate the class limits if they want
to form frequency tables for values in specific intervals, such
as in Table 3–2, by recoding the original observations.

Some tables present only frequencies (number of patients or subjects);
others present percentages as well. Percentages are
found by dividing the number of observations in a given class, n_{i}, by the total number of
observations, n, and then multiplying
by 100. For example, for the shock index class from 0.40 up to 0.50
in Table 3–2, the percentage is

For some applications, cumulative frequencies, or percentages,
are desirable. The cumulative frequency is
the percentage of observations for a given value plus that for all
lower values. The cumulative value in the last column of Table 3–2,
for instance, shows that almost 75% of patients had a shock index
less that 0.80.

Histograms,
Box Plots, & Frequency Polygons

Graphs are used extensively in medicine—in journals,
in presentations at professional meetings, and in advertising literature.
Graphic devices especially useful in medicine are histograms, box plots,
error plots, line graphs, and scatterplots.

Histograms

A histogram of the changes in scores for elderly subjects in
the Hébert and coworkers' (1997) study of functional
stability is shown in Figure 3–3. Histograms usually
present the measure of interest along the X-axis
and the number or percentage of observations along the Y-axis. Whether numbers or percentages
are used depends on the purpose of the histogram. For example, percentages
are needed when two histograms based on different numbers of subjects
are compared.

Figure 3-3.

Histogram of change in total function scores using 5-point
categories. (Data used with permission, from Hébert R,
Brayne C, Spiegelhalter D: Incidence of functional decline and improvement in
a community-dwelling, very elderly population. Am
J Epidemiol 1997;145:935–944.
Graph produced with SPSS, a registered trademark of SPSS, Inc.;
used with permission.)

You may notice that the numbers of patients in each class are
different from the numbers we found when creating the stem-and-leaf
plots. This incongruity occurs because most statistical computer
programs determine the class limits automatically. As with frequency
tables, it is possible to recode the numbers into a new measure
if you want to specify the class limits.

Note that the area of each bar is in proportion to the percentage
of observations in that interval; for example, the nine observations
in the –5 interval (values between –7.5 and –2.5)
account for 9/72, or 12.5%, of the area covered
by this histogram. A histogram therefore communicates information about area, one reason the width of classes
should be equal; otherwise the heights of columns in the histogram
must be appropriately modified to maintain the correct area. For
example, in Figure 3–3, if the lowest class were 10 score
points wide (from –12.5 to –2.5) and all other
classes remained 5 score points wide, 11 observations would fall
in the interval. The height of the column for that interval should
then be only 5.5 units (instead of 11 units) to compensate for its
doubled width.

Box Plots

A box plot, sometimes called a box-and-whisker plot by
Tukey (1977)
, is another way to display information
when the objective is to illustrate certain locations in the distribution.
It can be constructed from the information in a stem-and-leaf plot
or a frequency table. A stem-and-leaf plot for patients 85 years
of age or older is given in Table 3–7. The median and the
first and third quartiles of the distribution are used in constructing
box plots. Computer programs do not routinely denote the mean and
75th and 25th quartiles with stem-and-leaf plots, but it is easy
to request this information, as illustrated in Table 3–7.
The median change in SMAF score is 2.5, the 75th percentile is 9,
and the 25th percentile is 0.

Table 3–7. Descriptive
Information and Stem-and-Leaf Plot of Smaf Score Changes for Subjects
85 Years Old or Older.

Table 3–7. Descriptive
Information and Stem-and-Leaf Plot of Smaf Score Changes for Subjects
85 Years Old or Older.

Quartile Section of SMAF Score Changes

Parameter

10th Percentile

25th Percentile

50th Percentile

75th Percentile

90th Percentile

Value

–4

0

2.5

9

22.4

95% LCL

–6

–3

1

5

10

95% UCL

–2

1

4

18

40

Stem-and-Leaf Section of SMAF Score Changes

Depth

Stem Leaves

2

88

3

S

6

9

F

554444

13

T

3322

19

–0*

111000

28

0*

00001111

(14)

T

22222222333333

30

F

44455

25

S

77

23

889999

17

1*

00000

12

T

12

F

5

11

S

11

89

9

2*

11

High

23, 27, 28, 37, 40, 47, 53

Unit = 1 Example: 1 | 2 Represents
12

Source: Data, used with
permission of the authors and the publisher, from Hébert
R, Brayne C, Spiegelhalter D: Incidence of functional decline and
improvement in a community-dwelling, very elderly population. Am
J Epidemiol 1997;145:935–944. Plot produced with NCSS;
used with permission.

Note that the values for the stem are different from the ones
we used because most computer programs determine the stem values.
In Table 3–7, there is a stem value for every two values
of change in SMAF score. Although it is not very intuitive, the
stem values represent class sizes of 2 and the following symbols
represent: l for numbers ending in 0 and 1; T for 2 and 3; F for
4 and 5; S for 6 and 7; and * for 8 and 9.

A box plot of the changes in SMAF scores for patients 85 years
or older is given in Figure 3–4.^{b} A box is drawn
with the top at the third quartile and the bottom at the first quartile;
quartiles are sometimes referred to as hinges in
box plots. The length of the box is a visual representation of the
interquartile range, representing the middle 50% of the
data. The width of the box is chosen to be pleasing esthetically.
The location of the midpoint or median of the distribution is indicated
with a horizontal line in the box. Finally, straight lines, or whiskers, extend 1.5 times the interquartile
range above and below the 75th and 25th percentiles. Any values
above or below the whiskers are called outliers.

^{b}For this analysis, we selected the following patients: 85
years, score on the total SMAF at time 3 –1.

Figure 3-4.

Box plot of SMAF score changes for subjects 85 years
old or older. (Data used with permission, from Hébert R,
Brayne C, Spiegelhalter D: Incidence of functional decline and improvement
in a community-dwelling, very elderly population. Am J Epidemiol 1997;145:935–944. Plot produced
with NCSS; used with permission.)

Box plots communicate a great deal of information; for example,
we can easily see from Figure 3–4 that the score changes
range from about –10 to about 55 (actually, from –8
to 53). Half of the score changes were between about 0 and 8, and
the median is a little larger than 0. There are seven outlying values;
four patients had score changes greater than 35 points.

Use the CD-ROM [available only with the book] to generate box plots for shock index separately
for patients with and without a PE in the Kline and colleagues study
(2002). Do these graphs enhance your understanding of the distributions?

Frequency Polygons

Frequency polygons are line graphs similar to histograms and
are especially useful when comparing two distributions on the same
graph. As a first step in constructing a frequency polygon, a stem-and-leaf
plot or frequency table is generated. Table 3–8 contains
the frequencies shock index for patients who did and did not have
a PE.

Table 3–8. Frequency
Table for Shock Index.

Table 3–8. Frequency
Table for Shock Index.

A. Shock Index in Patients Not Having a Pulmonary Embolism

Category

Count

Cumulative Count

Percent

Cumulative Percent

Graph of Percent

0.40 or less

33

33

4.39

4.39

/

0.40 thru 0.49

94

127

12.52

16.91

/////

0.50 thru 0.59

160

287

21.30

38.22

////////

0.60 thru 0.69

161

448

21.44

59.65

////////

0.70 thru 0.79

120

568

15.98

75.63

//////

0.80 thru 0.89

89

657

11.85

87.48

////

0.90 thru 0.99

39

696

5.19

92.68

//

1.00 thru 1.09

28

724

3.73

96.40

/

1.10 thru 1.19

15

739

2.00

98.40

/

1.20 or higher

12

751

1.60

100.00

/

B. Shock Index in Patients Having a Pulmonary Embolism

Category

Count

Cumulative Count

Percent

Cumulative Percent

Graph of Percent

0.40 or less

5

5

2.76

2.76

/

0.40 thru 0.49

10

15

5.52

8.29

///

0.50 thru 0.59

38

53

20.99

29.28

////////

0.60 thru 0.69

38

91

20.99

50.28

////////

0.70 thru 0.79

35

126

19.34

69.61

///////

0.80 thru 0.89

13

139

7.18

76.80

///

0.90 thru 0.99

21

160

11.60

88.40

////

1.00 thru 1.09

9

169

4.97

93.37

/

1.10 thru 1.19

4

173

2.21

95.58

/

1.20 or higher

8

181

4.42

100.00

/

Source: Data, used with
permission of the authors and publisher, Kline JA, Nelson RD, Jackson
RE, Courtney DM: Criteria for the safe use of D-dimer
testing in emergency department patients with suspected pulmonary
embolism: A multicenter US study. Ann Emergency Med 2002;39:144–152.
Table produced with NCSS; used with permission.

Figure 3–5 is a histogram based on the frequencies for
patients who had a PE with a frequency polygon superimposed on it.
It demonstrates that frequency polygons are constructed by connecting
the midpoints of the columns of a histogram. Therefore, the same
guidelines hold for constructing frequency polygons as for constructing
frequency tables and histograms. Note that the line extends from
the midpoint of the first and last columns to the X-axis in order to close up both ends
of the distribution and indicate zero frequency of any values beyond
the extremes. Because frequency polygons are based on a histogram,
they also portray area.

Figure 3-5.

Frequency polygon of shock index for patients with a
pulmonary embolism. (Data, used with permission, from Kline JA,
Nelson RD, Jackson RE, Courtney DM: Criteria for the safe use of D-dimer testing in emergency department
patients with suspected pulmonary embolism: A multicenter US study. Ann Emergency Med 2002;39:144–152. Plot produced
with NCSS; used with permission.)

Graphs Comparing
Two or More Groups

Merely looking at the numbers in Table 3–8 is insufficient
for deciding if the distributions of shock index are similar for
patients with and without a PE are similar. Several methods are
useful for comparing distributions.

Box plots are very effective when there is more than one group
and are shown for shock index among patients with and without a
PE in Figure 3–6. The distributions of the shock index
are similar, although more variability exists in patients without
a PE, and the median index is slightly higher in those who did have
a PE. Does a difference exist between the two groups; we will have to
wait until Chapter 6 to learn the answer.

Figure 3-6.

Box plot of shock index for patients with and without
a pulmonary embolism. (Data, used with permission, from Kline JA,
Nelson RD, Jackson RE, Courtney DM: Criteria for the safe use of D-dimer testing in emergency department
patients with suspected pulmonary embolism: A multicenter US study. Ann Emergency Med 2002;39:144–152. Plot produced
with NCSS; used with permission.)

Percentage polygons are also useful
for comparing two frequency distributions. Percentage polygons for
shock index in both patients with and without PE are illustrated
in Figure 3–7. Frequencies must be converted to percentages
when the groups being compared have unequal numbers of observations,
and this conversion has been made for Figure 3–7. It appears
that the distribution of shock index does not appear to be very
different for the two patient groups; most of the area in one polygon
is overlapped by that in the other. Thus, the visual message of
box plots and frequency polygons is consistent.

Figure 3-7.

Frequency polygon of shock index for patients with and
without a pulmonary embolism. (Data, used with permission, from
Kline JA, Nelson RD, Jackson RE, Courtney DM: Criteria for the safe use
of D-dimer testing in emergency department
patients with suspected pulmonary embolism: A multicenter US study. Ann Emergency Med 2002;39:144–152. Graph produced
with SPSS, a registered trademark of SPSS, Inc.; used with permission.)

Another type of graph often used in the medical literature is
an error bar plot. Figure 3–8 contains error bars for patients
with and without a PE. The circle designates the mean, and the bars
illustrate the standard deviation, although some authors use the
mean and standard error (a value smaller than the standard deviation,
discussed in Chapter 4). We recommend using standard deviations
and discuss this issue further in Chapter 4. The error bars indicate
the similarity of the distributions, just as the percentage polygons
and the box plots do.

Figure 3-8.

Error bar charts of shock index for patients with and
without a pulmonary embolism. (Data, used with permission, from
Kline JA, Nelson RD, Jackson RE, Courtney DM: Criteria for the safe
use of D-dimer testing in emergency
department patients with suspected pulmonary embolism: A multicenter
US study. Ann Emergency Med 2002;39:144–152. Graph produced
with SPSS, a registered trademark of SPSS, Inc.; used with permission.)

Look at Figures 3–6, 3–7, 3–8 and
decide which you think provides the most useful information.

Summarizing
Nominal & Ordinal Data with Numbers

When observations are measured on a nominal, or categorical, scale, the methods just
discussed are not appropriate. Characteristics measured on a nominal scale
do not have numerical values but are counts or frequencies of occurrence.
The study on domestic violence examined a number of characteristics
about physicians, including previous training in domestic violence,
to help explain differences in screening behavior (Lapidus et al, 2002).
Both screening and previous training are dichotomous, or binary, meaning that only two categories
are possible. In this section, we examine measures that can be used
with such observations.

Ways to Describe
Nominal Data

Nominal data can be measured using several methods: proportions,
percentages, ratios, and rates. To illustrate these measures, we
will use the numbers of physicians who screened patients for domestic
violence based on whether they had previous training; the data are
given in Table 3–9.

Table 3–9. Physician
Screening Prevalence by Demographic, Practice, and Domestic Violence
Training Characteristics for 438 Respondents.

Table 3–9. Physician
Screening Prevalence by Demographic, Practice, and Domestic Violence
Training Characteristics for 438 Respondents.

Screen

Yes

No

Total

Location of practice

Urban

97

28

125

Suburban

199

99

298

Rural

34

11

45

Type of practice

Private

261

110

371

Other

40

12

52

Teaching residents

Yes

154

62

216

No

175

76

251

Previous DV training

Yes

175

27

202

No

155

111

266

Source: Data, used with
permission of the authors and publisher, Lapidus G, Cooke MB, Gelven
E, Sherman K, Duncan M, Bancol L: A statewide survey of domestic
violence screening behaviors among pediatricians and family physicians.
Arch Pediatr Adolesc Med 2002;156:332–336. Table produced
with Microsoft Excel.

Proportions
and Percentages

A proportion is the number, a, of
observations with a given characteristic (such as those who screened
for domestic violence) divided by the total number of observations, a + b, in
a given group (such as those who had previous training). That is,

A proportion is always defined as a part divided
by the whole and is useful for ordinal
and numerical data as well as nominal data, especially when the
observations have been placed in a frequency table. In the domestic
violence study, the proportion of physicians trained in domestic
violence who subsequently screened patients is 175/202 = 0.866,
and the proportion without training who subsequently screened patients
is 155/266 = 0.583.

A percentage is simply the proportion
multiplied by 100%.

Ratios and Rates

A ratio is the number of observations
in a group with a given characteristic divided by the number of observations
without the given characteristic:

A ratio is always defined as a part divided
by another part. For example, among
physicians who trained, the ratio of those who screened patients
to those who did not is 175/27 = 6.481. Other
familiar ratios in medicine include ratios of the three components
of cholesterol (HDL, LDL, triglycerides), such as the LDL/ HDL
ratio.

Rates are similar to proportions
except that a multiplier (eg, 1000, 10,000, or 100,000) is used,
and they are computed over a specified period of time. The multiplier
is called the base, and the formula
is

For example, if a study lasted exactly 1 year and the proportion
of patients with a given condition was 0.002, the rate per 10,000 patients would be (0.002) x (10,000),
or 20 per 10,000 patients per year.

Vital Statistics
Rates

Rates are very important in epidemiology and evidence-based medicine;
they are the basis of the calculation of vital statistics, which
describe the health status of populations. Some of the most commonly
used rates are briefly defined in the following sections.

Mortality Rates

Mortality rates provide a standard
way to compare numbers of deaths occurring in different populations,
deaths due to different diseases in the same population, or deaths
at different periods of time. The numerator in a mortality rate
is the number of people who died during a given period of time,
and the denominator is the number of people who were at risk of
dying during the same period. Because the denominator is often difficult
to obtain, the number of people alive in the population halfway through
the time period is frequently used as an estimate. Table 3–10
gives death data from Vital Statistics of
the United States.

Table 3–10. Number
of Deaths, Death Rates, and Age-Adjusted Death Rates, by Race and
Sex: United States 1987–1996a

Table 3–10. Number
of Deaths, Death Rates, and Age-Adjusted Death Rates, by Race and
Sex: United States 1987–1996^{a}

Number of Deaths in the United States: 1987–1996

All Races

White

Black

Year

Both Sexes

Male

Female

Both Sexes

Male

Female

Both Sexes

Male

Female

1996

2,314,690

1,163,569

1,151,121

1,992,966

991,984

1,000,982

282,089

149,472

132,617

1995

2,312,132

1,172,959

1,139,173

1,987,437

997,277

990,160

286,401

154,175

132,226

1994

2,278,994

1,162,747

1,116,247

1,959,875

988,823

971,052

282,379

153,019

129,360

1993

2,268,553

1,161,797

1,106,756

1,951,437

988,329

963,108

282,151

153,502

128,649

1992

2,175,613

1,122,336

1,053,277

1,873,781

956,957

916,824

269,219

146,630

122,589

1991

2,169,518

1,121,665

1,047,853

1,868,904

956,497

912,407

269,525

147,331

122,194

1990

2,148,463

1,113,417

1,035,046

1,853,254

950,812

902,442

265,498

145,359

120,139

1989

2,150,466

1,114,190

1,036,276

1,853,841

950,852

902,989

267,642

146,393

121,249

1988

2,167,999

1,125,540

1,042,459

1,876,906

965,419

911,487

264,019

144,228

119,791

1987

2,123,323

1,107,958

1,015,365

1,843,067

953,382

889,685

254,814

139,551

115,263

Death Rates in the United States per 100,000: 1987–1996

1996

872.5

896.4

849.7

906.9

918.1

896.2

842.0

939.9

753.5

1995

880.0

914.1

847.3

911.3

932.1

891.3

864.2

980.7

759.0

1994

875.4

915.0

837.6

905.4

931.6

880.1

864.3

987.8

752.9

1993

880.0

923.5

838.6

908.5

938.8

879.4

876.8

1,006.3

760.1

1992

852.9

901.6

806.5

880.0

917.2

844.3

850.5

977.5

736.2

1991

860.3

912.1

811.0

886.2

926.2

847.7

864.9

998.7

744.5

1990

863.8

918.4

812.0

888.0

930.9

846.9

871.0

1,008.0

747.9

1989

871.3

926.3

818.9

893.2

936.5

851.8

887.9

1,026.7

763.2

1988

886.7

945.1

831.2

910.5

957.9

865.3

888.3

1,026.1

764.6

1987

876.4

939.3

816.7

900.1

952.7

849.8

868.9

1,006.2

745.7

Age-Adjusted Death Rates in the United States per 100,000: 1987–1996

1996

491.6

623.7

381.0

466.8

591.4

361.9

738.3

967.0

561.0

1995

503.9

646.3

385.2

476.9

610.5

364.9

765.7

1,016.7

571.0

1994

507.4

654.6

385.2

479.8

617.9

364.9

772.1

1,029.9

572.0

1993

513.3

664.9

388.3

485.1

627.5

367.7

785.2

1,052.2

578.8

1992

504.5

656.0

380.3

477.5

620.9

359.9

767.5

1,026.9

568.4

1991

513.7

669.9

386.5

486.8

634.4

366.3

780.7

1,048.9

575.1

1990

520.2

680.2

390.6

492.8

644.3

369.9

789.2

1,061.3

581.6

1989

528.0

689.3

397.3

499.6

652.2

376.0

805.9

1,082.8

594.3

1988

539.9

706.1

406.1

512.8

671.3

385.3

809.7

1,083.0

601.0

1987

539.2

706.8

404.6

513.7

674.2

384.8

796.4

1,063.6

592.4

^{a}Crude rates on an annual basis per 100,000
population in specified group; age-adjusted rates per 100,000 U.S.
standard million population. Rates are based on populations enumerated
as of April 1 for census years and estimated as of July 1 for all
other years. Excludes deaths of nonresidents of the United States.

Source: Adapted, with permission,
from Peters KD, Kochanek KD, Murphy SL:Deaths: Final data for 1996.
National Vital Statistics Report: Vol. 47, no. 9, p. 16. National
Center for Health Statistics, 1998.

A crude rate is a rate computed over all individuals in a given
population. For example, the crude annual mortality rate in the
entire population from Table 3–10 is 872.5 per 100,000
in 1996. The sex-specific mortality rate for
males is 896.4 during that same year, and for females it is 849.7
per 100,000. Comparing the sex-specific mortality rates across the
years given in Table 3–10, the mortality rate appears to have
increased for women. Does this make sense, or could there be another
explanation? Consider that a larger number of older women may have
been living in 1996 than in previous years. This hypothesis can
be examined by adjusting the mortality rates for the age of people
at risk. When age-adjusted rates are examined in Table 3–10,
we see that the rates have been declining as we would expect. We
talk more about adjusting rates in the section of that title.

Cause-specific mortality rates measure
deaths in a population from a specific disease or adverse event.
Comparing cause-specific mortality rates over a period of time helps
epidemiologists to determine possible predisposing factors in the
development of disease as well as to make projections about future
trends.

Other commonly used mortality rates are infant mortality rate
and case fatality rate. The infant mortality rate, sometimes used
as an indicator of the level of general health care in a population,
is the number of infants who die before 1 year of age per 1000 live
births. The case fatality rate is the number of deaths from a specific
disease occurring in a given period divided by the number of individuals
with the specified disease during the same period.

Morbidity Rates

Morbidity rates are similar to mortality rates, but many epidemiologists
think they provide a more direct measure of health status in a population.
The morbidity rate is the number of
individuals who develop a disease in a given period of time divided
by the number of people in the population at risk.

Prevalence and incidence are two important measures frequently
used in medicine and epidemiology. Prevalence is
defined as the number of individuals with a given disease at a given
point in time divided by the population at risk for that disease
at that time. Incidence is defined
as the number of new cases that have occurred during a given interval
of time divided by the population at risk at the beginning of the
time interval. (Because prevalence does not involve a period of
time, it is actually a proportion, but is often mistakenly termed
a rate.) The term "incidence" is sometimes used
erroneously when the term "prevalence" is meant.
One way to distinguish between them is to look for units: An incidence
rate should always be expressed in terms of a unit of time.

We can draw an analogy between prevalence and incidence and two
of the study designs discussed in Chapter 2. Prevalence is like
a snapshot in time, as is a cross-sectional study. In fact, some
cross-sectional studies are called prevalence studies by epidemiologists.
Incidence, on the other hand, requires a period of time to transpire,
similar to cohort studies. Recall that cohort studies begin at a
given time and continue to examine outcomes over the specified length
of the study.

Epidemiologists use prevalence and incidence rates to evaluate
disease patterns and make future projections. For example, diabetes
mellitus has an increasing prevalence even though the annual incidence
rate of approximately 230 cases per 100,000 has remained relatively
stable over the past several years. The reason for the difference
is that once this disease occurs, an individual continues to have
diabetes the remainder of his or her life; but advances in care
of diabetic patients have led to greater longevity for these patients.
In contrast, for diseases with a short duration (eg, influenza)
or with an early mortality (eg, pancreatic cancer), the incidence
rate is generally larger than the prevalence.

Adjusting Rates

We can use crude rates to make comparisons between two different
populations only if the populations are similar in all characteristics
that might affect the rate. For example, if the populations are
different or confounded by factors
such as age, gender, or race, then age-, gender-, or race-specific
rates must be used, or the crude rates must be adjusted; otherwise,
comparisons will not be valid.

Rates in medicine are commonly adjusted for age. Often, two populations
of interest have different age distributions; yet many characteristics
studied in medicine are affected by age, becoming either more or
less frequent as individuals grow older. If the two populations
are to be compared, the rates must be adjusted to reflect what they
would be had their age distributions been similar.

Direct Method
of Adjusting Rates

As an illustration, suppose a researcher compares the infant
mortality rates from a developed country with those from a developing
country and concludes that the mortality rate in the developing
country is almost twice as high as the rate in the developed country.
Is this conclusion misleading; are confounding factors affecting
infant mortality that might contribute to different distributions
in the two countries? A relationship between birthweight and mortality
certainly exists, and in this example, a valid comparison of mortality
rates requires that the distribution of birthweight be similar in
the two countries. Hypothetical data are given in Table 3–11.

The crude infant mortality rate for the developed country is
12.0 per 1000 infants; for the developing country, it is 23.9 per
1000. The specific rates for the developing country are higher in
all birthweight categories. However, the two distributions of birthweight
are not the same: The percentage of low-birthweight infants (<
2500 g) is more than twice as high in the developing country as
in the developed country. Because birthweight of infants and infant
mortality are related, we cannot determine how much of the difference
in crude mortality rates between the countries is due to differences
in weight-specific mortality and how much is due to the developing
country's higher proportion of low-birthweight babies.
In this case, the mortality rates must be standardized or adjusted
so that they are independent of the distribution of birthweight.^{c}

^{c}Of course factors other than birthweight may affect
mortality, and it is important to remember that correcting for one
factor may not correct for others.

Determining an adjusted rate is
a relatively simple process when information such as that in Table
3–11 is available. For each population, we must know the
specific rates. Note that the crude rate in each country is actually
a weighted average of the specific
rates, with the number of infants born
in each birthweight category used as the weights. For
example, the crude mortality rate in the developed country is 2400/200,000 = 0.012,
or 12 per 1000, and is equal to

Table 3–11.
Infant Mortality Rate Adjustment: Direct Method.

Table 3–11.
Infant Mortality Rate Adjustment: Direct Method.

Developed Country

Developing Country

Infants Born

Deaths

Infants Born

Deaths

Birthweight

N (in 1000s)

%

No.

Rate

N (in 1000s)

%

No.

Rate

< 1500 g

20

10

870

43.5

30

21

1860

62.0

1500–2499 g

30

15

480

16.0

45

32

900

20.0

2500 g

150

75

1050

7.0

65

47

585

9.0

Total

200

2400

12.0

140

3345

23.9

Because the goal of adjusting rates is to have them reflect similar
distributions, the numbers in each category from one population,
called the reference population, are
used as the weights to form weighted averages for both populations.
Which population is chosen as the standard does not matter; in fact,
a set of frequencies corresponding to a totally separate reference
population may be used. The point is that the same set of numbers
must be applied to both populations.

For example, if the numbers of infants born in each birthweight
category in the developed country are used as the standard and applied
to the specific rates in the developing country, we obtain

The crude mortality rate in the developing country would therefore
be 15.95 per 1000 (rather than 23.9 per 1000) if the proportions
of infant birthweight were distributed as they are in the developed
country.

To use this method of adjusting rates, you must know the specific
rates for each category in the populations to be adjusted and the
frequencies in the reference population for the factor being adjusted.
This method is known as the direct method
of rate standardization.

Indirect Method
of Adjusting Rates

Sometimes specific rates are not available in the populations
being compared. If the frequencies of the adjusting factor, such
as age or birthweight, are known for each population, and any set
of specific rates is available (either for one of the populations
being compared or for still another population), an indirect method
may be used to adjust rates. The indirect method results in the standardized mortality ratio, defined
as the number of observed deaths divided by the number of expected
deaths.

To illustrate, suppose the distribution of birthweight is available
for both the developed and the developing countries, but we have
specific death rates only for another population, denoted the Standard Population in Table 3–12.
The expected number of deaths is calculated in each population
by using the specific rates from the standard population. For the
developed country, the expected number of deaths is

Specific Death Rates per 1000 in Standard Population

< 1500

20

30

50.0

1500–2499 g

30

45

20.0

2500 g

150

65

10.0

Number of Deaths

2400

3345

In the developing country, the expected number of deaths is

The standard mortality ratio (the observed number of deaths divided
by the expected number) for the developed country is 2400/3100 = 0.77.
For the developing country, the standard mortality ratio is 3345/3050 = 1.1.
If the standard mortality ratio is greater than 1, as in the developing country,
the population of interest has a mortality rate greater than that
of the standard population. If the standard mortality rate is less
than 1, as in the developed country, the mortality rate is less than
that of the standard population. Thus, the indirect method allows
us to make a relative comparison; in contrast, the direct method
allows us to make a direct comparison. If rates for one of the populations
of interest are known, these rates may be used; then the standardized
mortality ratio for this population is 1.0.

Tables &
Graphs for Nominal & Ordinal Data

We describe some of the more common methods for summarizing nominal
and ordinal data in this section. To illustrate how to construct
tables for nominal data, consider the observations on sepsis and
catheter culture given in Table 3–13 for the 34 patients
with acquired hemophilia (Bossi, 1998) in Presenting Problem 4.
The simplest way to present nominal data (or ordinal data, if there are
not too many points on the scale) is to list the categories in one
column of the table and the frequency (counts)
or percentage of observations in another column. Table 3–14
shows a simple way of presenting data for the number of patients
who did or did not have hematuria at the time of diagnosis of their
hemophilia.

Table 3–13.
Data on 34 Patients with Acquired Hemophilia Due to Factor VIII.

Table 3–13.
Data on 34 Patients with Acquired Hemophilia Due to Factor VIII.

ID

Age

Sex

Ecchymoses

Hematoma

Hematuria

Factor VIII

RBC Units >5

1

70

Men

Yes

No

Yes

5.0

Yes

2

70

Women

Yes

Yes

Yes

0.0

No

3

75

Women

Yes

Yes

No

1.0

Yes

4

93

Women

Yes

Yes

No

5.0

No

5

69

Men

No

No

Yes

5.0

No

6

85

Men

Yes

Yes

No

6.0

Yes

7

80

Women

No

No

No

1.0

No

8

26

Women

Yes

Yes

Yes

2.5

No

9

33

Women

Yes

Yes

Yes

3.5

Yes

10

81

Men

Yes

Yes

No

1.3

No

11

42

Women

Yes

Yes

No

30.0

No

12

74

Men

Yes

Yes

Yes

0.0

Yes

13

55

Men

Yes

Yes

Yes

3.0

Yes

14

86

Women

Yes

Yes

Yes

3.0

No

15

71

Men

Yes

Yes

No

6.0

No

16

89

Men

Yes

Yes

No

5.0

Yes

17

81

Women

Yes

Yes

No

0.0

Yes

18

82

Women

Yes

Yes

Yes

1.0

No

19

82

Women

Yes

Yes

No

3.0

No

20

71

Women

Yes

Yes

Yes

1.0

Yes

21

32

Women

Yes

Yes

Yes

2.0

Yes

22

30

Women

Yes

Yes

No

2.0

Yes

23

29

Women

Yes

Yes

No

0.0

No

24

78

Men

Yes

Yes

No

13.0

No

25

58

Men

Yes

Yes

Yes

1.0

No

26

26

Women

Yes

Yes

No

0.0

Yes

27

51

Men

Yes

Yes

No

3.0

No

28

69

Men

Yes

Yes

No

0.0

Yes

29

67

Men

Yes

Yes

No

1.0

No

30

44

Men

Yes

Yes

No

3.0

No

31

59

Women

Yes

Yes

No

3.0

Yes

32

59

Women

Yes

Yes

No

6.0

No

33

40

Men

Yes

Yes

Yes

3.0

Yes

34

22

Women

Yes

Yes

No

1.0

No

Source: Data, used with
permission, from Bossi P, Cabane J, Ninet J, Dhote R, Hanslik T,
Chosidow O, et al: Acquired hemophilia due to factor VIII inhibitors
in 34 patients. Am J Med 1998;105:400–408.

Table 3–14.
Contingency Table for Frequency of Hematuria in Patients with Acquired
Hemophilia.

Table 3–14.
Contingency Table for Frequency of Hematuria in Patients with Acquired
Hemophilia.

Hematuria

Number of Patients

Yes

13

No

21

Source: Data, used with
permission, from Bossi R, Cabane J, Ninet J, Dhote R, Hanslik T,
Chosidow O, et al: Acquired hemophilia due to factor VIII inhibitors
in 34 patients. Am J Med 1998;105:400–408.

When two characteristics on a nominal scale are examined, a common
way to display the data is in a contingency
table, in which observations are classified according to several
factors. Suppose we want to know the number of men and women who
had hematuria at the time of diagnosis. The first step is to list
the categories to appear in the table: men with and without hematuria
and women with and without hematuria (Table 3–15). Tallies
are placed for each patient who meets the criterion. Patient 1 has
a tally in the cell "Men with hematuria"; patient
2 has a tally in the cell "Women with hematuria"; and
so on. Tallies for the first seven patients are listed in Table
3–15.

Table 3–15.
Step 1 in Constructing Contingency Table for Men and for Women with
and without Hematuria.

Table 3–15.
Step 1 in Constructing Contingency Table for Men and for Women with
and without Hematuria.

Category

Tally

Men with hematuria

//

Men without hematuria

/

Women with hematuria

/

Women without hematuria

///

Source: Data, used with
permission, from Bossi R, Cabane J, Ninet J, Dhote R, Hanslik T,
Chosidow O, et al: Acquired hemophilia due to factor VIII inhibitors
in 34 patients. Am J Med 1998;105: 400–408.

The sum of the tallies in each cell is then used to construct
a contingency table such as Table 3–16, which contains
cell counts for all 34 patients in the study. Percentages are often
given along with the cell counts.

Table 3–16.
Contingency Table for Men and for Women with and without Hematuria.

Table 3–16.
Contingency Table for Men and for Women with and without Hematuria.

Sex

No Hematuria

Hematuria

Men

9

6

Women

12

7

Source: Data, used with
permission, from Bossi R, Cabane J, Ninet J, Dhote R, Hanslik T,
Chosidow O, et al: Acquired hemophilia due to factor VIII inhibitors
in 34 patients. Am J Med 1998;105: 400–408.

For a graphic display of nominal or ordinal data, bar charts
are commonly used. In a bar chart, counts
or percentages of the characteristic in different categories are
shown as bars. The investigators in this example could have used
a bar chart to present the number of patients with and without hematuria,
as illustrated in Figure 3–9. The categories of hematuria
(yes or no) are placed along the horizontal, or X-axis,
and the number of patients along the vertical, or Y-axis. Bar charts may also have the
categories along the vertical axis and the numbers along the horizontal
axis.

Figure 3-9.

Illustration of a bar chart. (Data used with permission,
from Bossi P, Cabane J, Ninet J, Dhote R, Hanslik T, Chosidow O,
et al: Acquired hemophilia due to factor VIII inhibitors in 34 patients. Am J Med 1998;105:400–408.
Graph produced with SPSS, a registered trademark of SPSS, Inc.;
used with permission.)

Other graphic devices such as pie charts and pictographs are
often used in newspapers, magazines, and advertising brochures.
They are occasionally used in the health field to display such resource
information as the portion of the gross national product devoted
to health expenditures or the geographic distribution of primary-care
physicians.

Describing Relationships
between Two Characteristics

Much of the research in medicine concerns the relationship between
two or more characteristics. The following discussion focuses on
examining the relationship between two variables measured on the
same scale when both are numerical, both are ordinal, or both are
nominal.

The Relationship
Between Two Numerical Characteristics

In Presenting Problem 2, Hébert and colleagues (1997)
wanted to estimate the relationship between the scores patients
had at different administrations of the Functional Autonomy Measurement
System (SMAF). The correlation coefficient (sometimes
called the Pearson product moment correlation coefficient, named
for the statistician who defined it) is one measure of the relationship
between two numerical characteristics, symbolized by X and Y. Table
3–17 gives the information needed to calculate the correlation
between the mental function scores at baseline and at the end of
2 years for women 85 years old and older (for the 51 subjects who
had both of these measures). The formula for the correlation coefficient,
symbolized by r, is

Table 3–17.
Calculation for Correlation Coefficient between Mental Ability at
Time 1 (X) and Time 3 (Y) for Women Patients 85 Years of Age or
Older.a

Table 3–17.
Calculation for Correlation Coefficient between Mental Ability at
Time 1 (X) and Time 3 (Y) for Women Patients 85 Years of Age or
Older.^{a}

Patient

X

Y

(X – )

(Y – )

(X – )^{2}

(Y – )^{2}

(X – ) (Y – )

1

6.0000

4.0000

4.4118

1.9020

19.4640

3.6176

8.3912

2

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

3

1.0000

0.0000

–0.5882

–2.0980

0.3460

4.4016

1.2340

22

2.0000

3.0000

0.4118

0.9020

0.1696

0.8136

0.3714

24

1.0000

0.0000

–0.5882

–2.0980

0.3460

4.4016

1.2340

42

2.0000

8.0000

0.4118

5.9020

0.1696

34.8336

2.4304

72

3.0000

4.0000

1.4118

1.9020

1.9932

3.6176

2.6852

103

1.0000

1.0000

–0.5882

–1.0980

0.3460

1.2056

0.6458

114

2.0000

0.0000

0.4118

–2.0980

0.1696

4.4016

–0.8640

121

0.0000

2.0000

–1.5882

–0.0980

2.5224

0.0096

0.1556

122

2.0000

3.0000

0.4118

0.9020

0.1696

0.8136

0.3714

123

1.0000

0.0000

–0.5882

–2.0980

0.3460

4.4016

1.2340

132

0.0000

4.0000

–1.5882

1.9020

2.5224

3.6176

–3.0208

151

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

159

8.0000

9.0000

6.4118

6.9020

41.1112

47.6376

44.2542

161

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

162

0.0000

7.0000

–1.5882

4.9020

2.5224

24.0296

–7.7854

173

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

183

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

188

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

220

0.0000

1.0000

–1.5882

–1.0980

2.5224

1.2056

1.7438

237

7.0000

1.0000

5.4118

–1.0980

29.2876

1.2056

–5.9422

241

3.0000

2.0000

1.4118

–0.0980

1.9932

0.0096

–0.1384

251

0.0000

2.0000

–1.5882

–0.0980

2.5224

0.0096

0.1556

266

3.0000

5.0000

1.4118

2.9020

1.9932

8.4216

4.0970

273

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

332

1.0000

0.0000

–0.5882

–2.0980

0.3460

4.4016

1.2340

347

0.0000

1.0000

–1.5882

–1.0980

2.5224

1.2056

1.7438

348

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

376

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

377

3.0000

12.0000

1.4118

9.9020

1.9932

98.0496

13.9796

396

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

425

5.0000

4.0000

3.4118

1.9020

11.6404

3.6176

6.4892

472

1.0000

1.0000

–0.5882

–1.0980

0.3460

1.2056

0.6458

501

3.0000

1.0000

1.4118

–1.0980

1.9932

1.2056

–1.5502

518

1.0000

3.0000

–0.5882

0.9020

0.3460

0.8136

–0.5306

526

1.0000

0.0000

–0.5882

–2.0980

0.3460

4.4016

1.2340

527

1.0000

1.0000

–0.5882

–1.0980

0.3460

1.2056

0.6458

531

2.0000

0.0000

0.4118

–2.0980

0.1696

4.4016

–0.8640

592

8.0000

11.0000

6.4118

8.9020

41.1112

79.2456

57.0778

604

3.0000

1.0000

1.4118

–1.0980

1.9932

1.2056

–1.5502

628

1.0000

1.0000

–0.5882

–1.0980

0.3460

1.2056

0.6458

634

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

638

1.0000

1.0000

–0.5882

–1.0980

0.3460

1.2056

0.6458

706

4.0000

3.0000

2.4118

0.9020

5.8168

0.8136

2.1754

714

0.0000

1.0000

–1.5882

–1.0980

2.5224

1.2056

1.7438

722

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

748

0.0000

1.0000

–1.5882

–1.0980

2.5224

1.2056

1.7438

755

1.0000

6.0000

–0.5882

3.9020

0.3460

15.2256

–2.2952

792

0.0000

0.0000

–1.5882

–2.0980

2.5224

4.4016

3.3320

793

3.0000

3.0000

1.4118

0.9020

1.9932

0.8136

1.2734

Sum

81.0000

107.0000

0.0018

0.002

220.3529

428.5098

179.0588

^{a}Values are reported to four decimal places
to minimize round-off error.

Source: Data, used with permission
of the authors and the publisher, from Hébert R, Brayne
C, Spiegelhalter D: Incidence of functional decline and improvement
in a community-dwelling, very elderly population. Am J Epidemiol
1997;145:935–944. Table produced with NCSS; used with permission.

As with the standard deviation, we give the formula and computation
for illustration purposes only and, for that reason, use the definitional
rather than the computational formula. Using the data from Table
3–17, we obtain a correlation of

Interpreting
Correlation Coefficients

What does a correlation of 0.58 between mental functioning at
time 1 and 2 years later mean? (Correlations are generally reported
to two decimal places.) Chapter 8 discusses methods used to tell
whether a statistically significant relationship exists; for now,
we will discuss some characteristics of the correlation coefficient
that will help us interpret its numerical value.

The correlation coefficient always ranges from –1 to +1,
with –1 describing a perfect negative linear (straight-line)
relationship and +1 describing a perfect positive linear
relationship. A correlation of 0 means no linear relationship exists
between the two variables.

Sometimes the correlation is squared (r^{2})
to form a useful statistic called the coefficient
of determination or r-squared, and
we recommend this practice. For the mental functioning data, the
coefficient of determination is (0.58)^{2}, or 0.34. This
means that 34% of the variability in one of the measures,
such as mental functioning at 2 years, may be accounted for (or
predicted) by knowing the value of the other measure, mental functioning
at baseline. Stated another way, if we know the value of an elderly
woman's score on the mental functioning part of the SMAF
and take that into consideration when examining the score 2 years
later, the variance (standard deviation squared) of the score after
2 years would be reduced by 34%, or about one-third.

Several other characteristics of the correlation coefficient
deserve mention. The value of the correlation coefficient is independent
of the particular units used to measure the variables. Suppose two
medical students measure the heights and weights of a group of preschool
children to determine the correlation between height and weight.
They measure the children's height in centimeters and record
their weight in kilograms, and they calculate a correlation coefficient
equal to 0.70. What would the correlation be if they had used inches
and pounds instead? It would, of course, still be 0.70, because
the denominator in the formula for the correlation coefficient adjusts for
the scale of the units.

The value of the correlation coefficient is markedly influenced
by outlying values, just as is the standard deviation. Thus the
correlation does not describe the relationship between two variables well
when the distribution of either variable is skewed or contains outlying
values. In this situation, a transformation of
the data that changes the scale of measurement and moderates the
effect of outliers (see Chapter 5) or the Spearman correlation can
be used.

People first learning about the correlation coefficient often
ask, "How large should a correlation be?" The
answer depends on the application. For example, when physical characteristics
are measured and good measuring devices are available, as in many
physical sciences, high correlations are possible. Measurement in
the biologic sciences, however, often involves characteristics that are
less well defined and measuring devices that are imprecise; in such
cases, lower correlations may occur. Colton (1974) gives the following
crude rule of thumb for interpreting the size of correlations:

Correlations from 0 to 0.25 (or –0.25) indicate
little or no relationship; those from 0.25 to 0.50 (or –0.25
to –0.50) indicate a fair degree of relationship; those
from 0.50 to 0.75 (or –0.50 to –0.75) a moderate
to good relationship; and those greater than 0.75 (or –0.75)
a very good to excellent relationship.

Colton cautions against correlations higher than 0.95 in the
biologic sciences because of the inherent variability in most biologic
characteristics. When you encounter a high correlation, you should
ask whether it is an error or an artifact or, perhaps, the result
of combining two populations (illustrated in Chapter 8). An example
of an artifact is when the number of pounds patients lose in the
first week of a diet program is correlated with the number of pounds
they lose during the entire 2-month program.

The correlation coefficient measures only a straight-line relationship;
two characteristics may, in fact, have a strong curvilinear relationship,
even though the correlation is quite small. Therefore, when you
analyze relationships between two characteristics, always plot the
data as we do in the section titled, "Graphs for Two Characteristics." A
plot will help you detect outliers and skewed distributions.

Finally, "correlation does not imply causation." The
statement that one characteristic causes another must be justified
on the basis of experimental observations or logical argument, not because
of the size of a correlation coefficient.

Use the CD-ROM [available only with the book] and find the correlation between shock index and
heart rate using the Kline and colleagues study (2002). Interpret
the correlation using the guidelines just described.

The Relationship
Between Two Ordinal Characteristics

The Spearman rank correlation, sometimes
called Spearman's rho (also named for the statistician
who defined it), is frequently used to describe the relationship
between two ordinal (or one ordinal and one numerical) characteristics.
It is also appropriate to use with numerical observations that are
skewed with extreme observations. The calculation of the Spearman
rank correlation, symbolized as r_{s},
involves rank-ordering the values on each of the characteristics
from lowest to highest; the ranks are then treated as though they
were the actual values themselves. Although the formula is simple when
no ties occur in the values, the computation is quite tedious. Because
the calculation is available on many computer programs, we postpone
its illustration until Chapter 8, where it is discussed in greater
detail.

The Relationship
Between Two Nominal Characteristics

In
studies involving two characteristics, the primary interest
may be in whether they are significantly related (discussed in Chapter
6) or the magnitude of the relationship, such as the relationship
between a risk factor and occurrence of a given outcome. Two ratios
used to estimate such a relationship are the relative
risk and the odds ratio, both
often referred to as risk ratios. For
example, in Presenting Problem 3, the investigators may wish to
learn whether instruction in domestic violence reduces the "risk" that
physicians neglect to screen for this condition. In the context
of this discussion, we introduce some of the important concepts
and terms that are increasingly used in the medical and health literature,
including the useful notion of the number of patients who need to
be treated in order to observe one positive outcome.

Experimental
and Control Event Rates

Important concepts in the computation of measures of risk are
called the event rates. Using the notation in Table 3–18,
we are interested in the event of a disease occurring. The experimental event
rate (EER) is the proportion of people with the risk factor who
have or develop the disease, or A/(A + B).
The control event rate (CER) is the proportion of people without
the risk factor who have or develop the disease, or C/(C + D).

Table 3–18.
Table Arrangement and Formulas for Several Important Measures of
Risk.

Table 3–18.
Table Arrangement and Formulas for Several Important Measures of
Risk.

Disease

No Disease

Risk factor present

A

B

A + B

Risk factor absent

C

D

C + D

A + C

B + D

The Relative
Risk

The relative risk, or risk ratio, of a disease, symbolized by RR, is the ratio of the incidence in
people with the risk factor (exposed persons) to the incidence in people
without the risk factor (nonexposed persons). It can therefore be
found by dividing the EER by the CER.

The Physicians' Health Study (Steering Committee of
the Physicians' Health Study Research Group, 1989) is a
classic study undertaken to learn whether aspirin in low doses (325
mg every other day) reduces the mortality from cardiovascular disease.
The participants in this clinical trial were 22,071 healthy male
physicians who were randomly assigned to receive aspirin or placebo and
were evaluated over an average period of 60 months. Table 3–19
gives data on MI for physicians taking aspirin and physicians taking
a placebo. Physicians taking aspirin and those taking the placebo
were followed to learn the number in each group who had an MI. In
this study, taking aspirin assumes the role of the risk factor.
The EER is the incidence of MI in physicians who took aspirin, or
139/11,037 = 0.0126; the CER is the incidence
of MI in those who took a placebo, or 239/11,034 = 0.0217.
The relative risk of MI with aspirin, compared with MI with placebo,
is therefore

Table 3–19.
Confirmed Cardiovascular End Points in the Aspirin Component of
the Physicians' Health Study, According to Treatment Group.

Table 3–19.
Confirmed Cardiovascular End Points in the Aspirin Component of
the Physicians' Health Study, According to Treatment Group.

Aspirin Group

Placebo Group

Number of patients

11,037

11,034

End Point

Myocardial infarction

Fatal

10

26

Nonfatal

129

213

Total

139

239

Person-years of observation

54,560.0

54,355.7

Stroke

Fatal

9

6

Nonfatal

110

92

Total

119

98

Person-years of observation

54,650.3

54,635.8

Source: Adapted and reproduced,
with permission, from Steering Committee of the Physicians' Health Study
Research Group: Final report on the aspirin component of the ongoing
Physicians' Health Study. N Engl J Med 1989; 321:129–135.

Because fewer MIs occurred among the group taking aspirin than
those taking the placebo, the relative risk is less than 1. If we
take the reciprocal and look at the relative risk of having an MI for
physicians in the placebo group, the relative risk is 1/0.58 = 1.72.
Thus, physicians in the placebo group were 1.7 times more likely
to have an MI than physicians in the aspirin group.

The relative risk is calculated only from a cohort study or a
clinical trial in which a group of subjects with the risk factor
and a group without it are first identified and then followed through
time to determine which persons develop the outcome of interest.
In this situation, the investigator determines the number of subjects
in each group.

Absolute Risk
Reduction

The absolute risk reduction (ARR) provides
a way to assess the reduction in risk compared with the baseline
risk. In the physician aspirin study (see Table 3–19),
the experimental event rate for an MI from any cause was 0.0126 in
the aspirin group, and the control event rate was 0.0217 in the
placebo group. The ARR is the absolute value of the difference between
these two event rates:

A good way to interpret these numbers is to think about them
in terms of events per 10,000 people. Then the risk of MI is 126
in a group taking aspirin and 217 in a group taking placebo, and
the absolute risk reduction is 91 per 10,000 people.

Number Needed
to Treat

An added advantage of interpreting risk data in terms of absolute
risk reduction is that its reciprocal, 1/ARR, is the number needed to treat (NNT) in order
to prevent one event. The number of people that need to be treated
to avoid one MI is then 1/0.0091, or 109.9 (about 110 people).
This type of information helps clinicians evaluate the relative
risks and benefits of a particular treatment. Based on the risks
associated with taking aspirin, do you think it is a good idea to
prescribe aspirin for 110 people in order to prevent one of them
from having an MI? The articles by Glasziou and coworkers (1998)
and Sackett and coworkers (2000) contain excellent discussions of
this topic; Nuovo and coworkers (2002) discuss the need to include
number needed to treat in reports of clinical trials.

Absolute Risk
Increase and Number Needed to Harm:

Some treatments or procedures increase the risk for a serious
undesirable side effect or outcome. In this situation, the (absolute
value of the) difference between the EER and the CER is termed the absolute risk increase (ARI).He and
colleagues (1998), in their report of a meta-analysis of randomized
trials of aspirin use, found an absolute risk reduction in MI of
137 per 10,000 persons, a result even larger than in the physician
aspirin study. They also looked at the outcome of stroke and reported
an absolute risk reduction in ischemic stroke of 39 in 10,000. Based
on their results, the NNT for the prevention of MI is 1/0.0137,
or 72.99 (about 73), and the NNT for the prevention of ischemic
stroke is 1/0.0039, or 256.41 (about 257). At the same
time, aspirin therapy resulted in an absolute risk increase in hemorrhagic
stroke of 12 in every 10,000 persons. The reciprocal of the absolute
risk increase, 1/ARI, is called the number
needed to harm (NNH). Based on the report by He and colleagues,
for hemorrhagic stroke the number needed to harm is 1/0.0012,
or 833. Based on these numbers, the authors concluded that the overall
benefits from aspirin therapy outweigh the risk for hemorrhagic
stroke.

Relative Risk
Reduction

A related concept, the relative risk
reduction (RRR), is also presented in the literature. This
measure gives the amount of risk reduction relative to the baseline
risk; that is, the EER minus the CER all divided by the control
(baseline) event rate, CER. The RRR in the physician aspirin study
is

or approximately 42%. The relative risk reduction tells
us that, relative to the baseline risk of 217 MIs in 10,000 people,
giving aspirin reduces the risk by 42%.

Many clinicians feel that the absolute risk reduction is a more
valuable index than the relative risk reduction because its reciprocal
is the number needed to treat. If a journal article gives only the relative
risk reduction, it can (fairly easily) be converted to the absolute
risk reduction by multiplying by the control event rate, a value
that is almost always given in an article. For instance, 0.4194 x 0.0217
is 0.0091, the same value we calculated earlier for the ARR.

The Odds Ratio

The odds ratio provides a way to look at risk in case–control
studies. To discuss the odds ratio, we use the study by Ballard
and coworkers (1998)^{d} in which the use of antenatal thyrotropin-releasing
hormone was studied. Data from this study are given in Table 3–20.
The odds ratio (OR) is the odds that
a person with an adverse outcome was at risk divided by the odds
that a person without an adverse outcome was at risk. The odds ratio
is easy to calculate when the observations are given in a 2 x 2
table. The numbers of infants developing respiratory distress syndrome
in Table 3–20 are rearranged and given in Table 3–21.

^{d}The authors could have presented the relative risk
because the study was a clinical trial, but they chose to give the
odds ratio as well. At one time, use of the odds ratio was generally
reserved for case–control studies. One of the statistical
methods used increasingly in medicine, logistic
regression, can be interpreted in terms of odds ratios. We
discuss this method in detail in Chapter 10. We discuss the issue
of statistical significance and risk ratios in Chapter 8.

Table 3–20.
Outcomes of Infants in the Thyrotropin-Releasing Hormone and Placebo
Groups.

Table 3–20.
Outcomes of Infants in the Thyrotropin-Releasing Hormone and Placebo
Groups.

Infants at Risk

Infants Not at Risk

Outcome

TRH (N = 392)

Placebo (N = 377)

Odds Ratio (95% CI)

TRH (N = 171)

Placebo (N = 194)

Odds Ratio (95% CI)

Respiratory distress syndrome

260

244

1.1 (0.8–1.5)

5

13

0.4 (0.1–1.3)

Death 28 days after delivery

43

42

1.0 (0.6–1.6)

2

1

2.3 (0.1–135)

Chronic lung disease or death 28 days after delivery

175

157

1.1 (0.8–1.3)

3

2

1.7 (0.2–20.7)

Source: Data, used with
permission, from Table 2 in Ballard RA, Ballard PL, Cnaan A, Pinto-Martin
J, Davis DJ, Padbury JF, et al: Antenatal thyrotropin-releasing
hormone to prevent lung disease in preterm infants. N Engl J Med
1998;338:493–498.

Table 3–21.
Data for Odds Ratio for Infants at 32 Weeks or Fewer of Gestation.

Table 3–21.
Data for Odds Ratio for Infants at 32 Weeks or Fewer of Gestation.

Group

With Respiratory Distress

Without Respiratory Distress

Total

TRH

260

132

392

Placebo

244

133

377

Total

504

265

Source: Data, used with
permission, from Table 2 in Ballard RA, Ballard PL, Cnaan A, Pinto-Martin
J, Davis, DJ, Padbury JF, et al: Antenatal thyrotropin-releasing
hormone to prevent lung disease in preterm infants. N Engl J Med
1998;338:493–498.

In this study, the odds that an infant with respiratory distress
syndrome was exposed to TRH are

and the odds that an infant without respiratory distress syndrome
was exposed to TRH are

Putting these two odds together to obtain the odds ratio gives

An odds ratio of 1.1 means that an infant in the TRH group is
1.1 times more likely to develop respiratory distress syndrome than
an infant in the placebo group. This risk does not appear to be much
greater, and Ballard and coworkers (1998) reported that the odds
ratio was not statistically significant.

The odds ratio is also called the cross-product ratio because
it can be defined as the ratio of the product of the diagonals in
a 2 x 2 table:

In case–control studies, the investigator decides how
many subjects with and without the disease will be studied. This
is the opposite from cohort studies and clinical trials, in which
the investigator decides the number of subjects with and without
the risk factor. The odds ratio should therefore be used with case–control
studies.

Readers interested in more detail are referred to the very readable
elementary text on epidemiology by Fletcher and colleagues (1996).
Information on other measures of risk used in epidemiology can be
found in Greenberg (2000).

Graphs for Two
Characteristics

Most studies in medicine involve more than one characteristic,
and graphs displaying the relationship between two characteristics
are common in the literature. No graphs are commonly used for displaying
a relationship between two characteristics when both are measured
on a nominal scale; the numbers are simply presented in contingency
tables. When one of the characteristics is nominal and the other
is numerical, the data can be displayed in box plots like the one
in Figure 3–6 or error plots, as in Figure 3–8.

Also common in medicine is the use of bivariate
plots (also called scatterplots or
scatter diagrams) to illustrate the relationship between two characteristics
when both are measured on a numerical scale. In the study by Hébert
and colleagues (1997), information was collected on the mental functioning
of each patient at three times, each 1 year apart. Box 3–1 contains
a scatterplot of mental functioning scores at times 1 and 3 for
women age 85 or older. A scatterplot is constructed by drawing X- and Y-axes;
the characteristic hypothesized to explain or predict or the one
that occurs first (sometimes called the risk factor) is placed on
the X-axis. The characteristic or outcome
to be explained or predicted or the one that occurs second is placed
on the Y-axis. In applications in which
a noncausal relationship is hypothesized, placement for the X- and Y-axes
does not matter. Each observation is represented by a small circle;
for example, the circle in the lower right in the graph in Box 3–1
represents subject 237, who had a score of 7 at baseline and a score
of 1 two years later. More information on interpreting scatterplots
is presented in Chapter 8, but we see here that the data in Box
3–1 suggest the possibility of a positive relationship
between the two scores. At this point, we cannot say whether the
relationship is significant or one that simply occurs by chance;
this topic is covered in Chapter 8.

Box 3–1. Illustration
of a Scatterplot.

Box 3–1. Illustration
of a Scatterplot.

Pearson correlation studies

Pearson Correlations Study

Time 1

Time 3

Time 1

1.000000

0.582715

Time 3

0.582715

1.000000

Source: Data, used with
permission, from Hébert R, Brayne C, Spiegelhalter D: Incidence
of functional decline and improvement in a community-dwelling very
elderly population. Am J Epidemiol 1997;145:935–944. Plot produced
with NCSS; used with permission.

Some of you may notice that fewer data points occur in Box 3–1
than in Table 3–17. This results when several data points
have the same value. Both NCSS and SPSS have an option for using "sunflowers," in
which each sunflower petal stands for one observation.

Use the CD-ROM [available only with the book] to produce a scatterplot, and choose the sunflower
option. Do you think this is helpful in interpreting the plot?

As a final note, there is a correspondence between the size of
the correlation coefficient and a scatterplot of
the observations. We also included in Box 3–1 the output
from NCSS giving the correlation coefficient. Recall that a correlation
of 0.58 indicates a moderate to good relationship between the two
mental functioning scores. When the correlation is near 0, the shape
of the pattern of observations is more or less circular. As the
value of the correlation gets closer to +1 or –1,
the shape becomes more elliptical, until, at +1 and –1,
the observations fall directly on a straight line. With a correlation
of 0.58, we expect a scatter plot of the data to be somewhat oval-shaped,
as it is in Box 3–1.

Whether to use tables or graphs is generally based on the purpose
of the presentation of the data. Tables give more detail and can
display information about more characteristics, but they take longer
to read and synthesize. Graphs present an easy-to-interpret picture
of the data and are appropriate when the goal is to illustrate the
distribution or frequencies without specific details.

Examples of
Misleading Charts & Graphs

The quality of charts and graphs published in the medical literature
is higher than that in similar displays in the popular press. The
most significant problem with graphs (and tables as well) in medical
journal articles is their complexity. Many authors attempt to present
too much information in a single display, and it may take the reader
a long time to make sense of it.

The purpose of tables and graphs is to present information (often
based on large numbers of observations) in a concise way so that
observers can comprehend and remember it more easily. Charts, tables,
and graphs should be simple and easily understood by the reader,
and concise but complete labels and legends should accompany them.

Knowing about common errors helps you correctly interpret information
in articles and presentations. We illustrate four errors we have
seen with sufficient frequency to warrant their discussion. We use
hypothetical examples and do not imply that they necessarily occurred
in the presenting problems used in this text. If you are interested
in learning more about table and graph construction, a discussion
by Wainer (1992) makes recommendations for designing tables for
data. Spirer and colleagues (1998) provide an entertaining discussion
of graphs, and Briscoe (1996) has suggestions for improving all
types of presentations and posters, as well as publications.

A researcher can make a change appear more or less dramatic by
selecting a starting time for a graph, either before or after the
change begins. Figure 3–10A shows the decrease in annual
mortality from a disease, beginning in 1960 and continuing with
the projected mortality through 2010. The major decrease in mortality
from this disease occurred in the 1970s. Although not incorrect,
a graph that begins in 1980 (Figure 3–10B) deemphasizes
the decrease and implies that the change has been small.

Figure 3-10.

Illustration of effect of portraying change at two different
times. A: Mortality from a disease since 1960. B: Mortality from
a disease since 1980.

If the values on the Y-axis are
large, the entire scale cannot be drawn. For example, suppose an
investigator wants to illustrate the number of deaths from cancer,
beginning in 1960 (when there were 200,000 deaths) to the year 2010
(when 600,000 deaths are projected). Even if the vertical scale
is in thousands of deaths, it must range from 200 to 600. If the Y-axis is not interrupted, the implied
message is inaccurate; a misunderstanding of the scale makes the
change appear larger than it really is. This error, called suppression of zero, is common in
histograms and line graphs. Figure 3–11A illustrates the
effect of suppression of zero on the number of deaths from cancer
per year; Figure 3–11B illustrates the correct construction.
The error of suppression of zero is more serious on the Y-axis than on the X-axis,
because the scale on the Y-axis represents
the magnitude of the characteristic of interest. Many researchers
today use computer programs to generate their graphics. Some programs
make it difficult to control the scale of the Y-axis
(and the X-axis as well). As readers,
we therefore need to be vigilant and not be unintentionally misled
by this practice.

Figure 3-11.

Illustration of effect of suppression of zero on Y-axis
in graphs showing deaths from cancer. A: No break in the line on
Y-axis. B: Break in the line correctly placed on Y-axis.

The
magnitude of change can also be enhanced or minimized by
the choice of scale on the vertical axis. For example, suppose a
researcher wishes to compare the ages at death in a group of men and
a group of women. Figure 3–12A, by suppressing the scale,
indicates that the ages of men and women at death are similar; Figure
3–12B, by stretching the scale, magnifies the differences
in age at death between men and women.

Figure 3-12.

Illustration of effect of suppressing or stretching
the scale in plots showing age at death. A: Suppressing the scale.
B: Stretching the scale.

Our final example is a table that gives irrelevant percentages,
a somewhat common error. Suppose that the investigators are interested
in the relationship between levels of patient compliance and their
type of insurance coverage. When two or more measures are of interest,
the purpose of the study generally determines which measure is viewed
within the context of the other. Table 3–22A shows the
percentage of patients with different types of insurance coverage
within three levels of patient compliance, so the percentages in
each column total 100%. The percentages in Table 3–22A make sense if the investigator wishes to compare the type of insurance
coverage of patients who have specific levels of compliance; it
is possible to conclude, for example, that 35% of patients
with low levels of compliance have no insurance.

Table 3–22.
Effect of Calculating Column Percentages versus Row Percentages
for Study of Compliance with Medication versus Insurance Coverage.

Table 3–22.
Effect of Calculating Column Percentages versus Row Percentages
for Study of Compliance with Medication versus Insurance Coverage.

A. Percentages Based on Level of Compliance (Column %)

Level of Compliance with Medication

Insurance Coverage

Low

Medium

High

Medicaid

30

20

15

Medicare

20

25

30

Medicaid and Medicare

5

5

5

Other insurance

10

30

40

No insurance

35

20

10

B. Percentages Based on Insurance Coverage (Row %)

Level of Compliance with Medication

Insurance Coverage

Low

Medium

High

Medicaid

45

30

25

Medicare

25

35

40

Medicaid and Medicare

33

33

33

Other insurance

15

35

50

No insurance

55

30

15

Contrast this interpretation with that obtained if percentages
are calculated within insurance status, as in Table 3–22B,
in which percentages in each row total 100%. From Table
3–22B, one can conclude that 55% of patients with
no insurance coverage have a low level of compliance. In other words,
the format of the table should reflect the questions asked in the
study. If one measure is examined to see whether it explains another
measure, such as insurance status explaining compliance, investigators
should present percentages within the explanatory measure.

Computer Programs

As you have already seen, we give examples of output from computer packages especially designed
to analyze statistical data. As much as possible, we reproduce the
actual output obtained in analyzing observations from the presenting
problems, even though the output frequently contains statistics
not yet discussed. We discuss the important aspects of the output,
and, for the time being, you can simply ignore unfamiliar information
on the printout; in subsequent chapters, we will explain many of
the statistics. Statistical computer programs are designed to meet
the needs of researchers in many different fields, so some of the
statistics in the printouts may rarely be used in medical studies
and hence are not included in this book. We use the output from
several comprehensive statistical programs in this text, including
NCSS, SPSS, and JMP. For the most part, we concentrate on the first
two packages. In later chapters we also illustrate programs for
estimating the sample size needed for a study.

Summary

This chapter presents two important biostatistical concepts:
the different scales of measurement influence the methods for summarizing
and displaying information. Some of the summary measures we introduce
in this chapter form the basis of statistical tests illustrated
in subsequent chapters.

The simplest level of measurement is a nominal scale, also called
a categorical, or qualitative, scale. Nominal scales measure characteristics
that can be classified into categories; the number of observations
in each category is counted. Proportions, ratios, and percentages
are commonly used to summarize categorical data. Nominal characteristics
are displayed in contingency tables and in bar charts.

Ordinal scales are used for characteristics that have an underlying
order. The differences between values on the scale are not equal
throughout the scale. Examples are many disease staging schemes,
which have four or five categories corresponding to the severity
of the disease. Medians, percentiles, and ranges are the summary
measures of choice because they are less affected by outlying measurements.
Ordinal characteristics, like nominal characteristics, are displayed
in contingency tables and bar charts.

Numerical scales are the highest level of measurement; they are
also called interval, or quantitative, scales. Characteristics measured
on a numerical scale can be continuous (taking on any value on the
number line) or discrete (taking on only integer values).

We recommend that the mean be used with observations that have
a symmetric distribution. The median, also a measure of the middle,
is used with ordinal observations or numerical observations that
have a skewed distribution. When the mean is appropriate for describing
the middle, the standard deviation is appropriate for describing
the spread, or variation, of the observations. The value of the
standard deviation is affected by outlying or skewed values, so
percentiles or the interquartile range should be used with observations
for which the median is appropriate. The range gives information
on the extreme values, but alone it does not provide insight into
how the observations are distributed.

An easy way to determine whether the distribution of observations
is symmetric or skewed is to create a histogram or box plot. Other
graphic methods include frequency polygons or line graphs, and error
plots. Although each method provides information on the distribution
of the observations, box plots are especially useful as concise
displays because they show at a glance the distribution of the values.
Stem-and-leaf plots combine features of frequency tables and histograms; they
show the frequencies as well as the shape of the distri bution.
Frequency tables summarize numerical observations; the scale is
divided into classes, and the number of observations in each class
is counted. Both frequencies and percentages are commonly used in
frequency tables.

When measurements consist of one nominal and one numerical characteristic,
frequency polygons, box plots, and error plots illustrate the distribution
of numerical observations for each value of the nominal characteristic.

The correlation coefficient indicates the degree of the relationship
between the two characteristics on the same group of individuals.
Spearman's rank correlation is used with skewed or ordinal observations.
When the characteristics are measured on a nominal scale and proportions
are calculated to describe them, the relative risk or the odds ratio
may be used to measure the relationship between two characteristics.

We
used data from the study by Kline and colleagues (2002) to
illustrate the calculation of common statistics for summarizing
data, such as the mean, median, and standard deviation, and to provide
some useful ways of displaying data in graphs. The study, conducted
in the emergency departments of seven urban hospitals, involved
the prospective collection of data from 934 patients who had undergone
a pulmonary vascular imaging study (contrast-enhanced CT scan of the
chest or a ventilation/perfusion lung scan [V/Q
scan]) because of the clinical suspicion of a pulmonary
embolism (PE). A final diagnosis of PE was established using a
combination
of vascular imaging studies plus use of other objective tests and
telephone follow-up at 6 months. Two medical students interviewed
each patient independently to collect clinical data that were analyzed
using multivariate logistic regression analysis (discussed in Chapter
10) to select six variables (age, shock index, unexplained hypoxemia,
unilateral leg swelling, recent surgery, and hemoptysis) significantly
associated with the presence of PE. They constructed a "decision
rule" using the six variables to define a high-risk group
of patients with a 40% pretest probability of PE.

Hébert and colleagues (1997) in Presenting Problem 2
focused on disability and functional changes in the elderly. We
used observations on subjects 85 years of age or older to illustrate stem-and-leaf
plots and box plots. Hébert and colleagues reported that
baseline SMAF scores indicated that women were significantly more
disabled than men for activities of daily living, mobility, and
mental function. Women were more independent in instrumental activities
of daily living (housekeeping, meal preparation, shopping, medication
use, budgeting). Generally, subjects showed significant declines
in all areas of functioning between baseline interview and the second
interview 1 year later. Functional decline was associated with age,
but not with sex. Interestingly, the functional score declines were
not significant (except for a slight decline in instrumental activities
of daily living) between the second and third interviews. The authors
proposed three explanations to account for this phenomenon: floor
effect, survival effect, and regression toward the mean—topics
discussed later in this text. Disability is one of the important
outcome measures in studies of the elderly population. We also examined
the relationship between SMAF scores at baseline and 2 years later
in this study and found a moderate to good relationship between
these measures.

The results of the study by Lapidus and colleagues (2002) on
screening for domestic violence (Presenting Problem 3) were used
to illustrate that proportions and percentages can be used interchangeably
to describe the relationship of a part to the whole; ratios relate
the two parts themselves. When a proportion is calculated over time,
the result is called a rate. Some of the rates commonly used in
medicine were defined and illustrated. For comparison of rates from
two different populations, the populations must be similar with
respect to characteristics that might affect the rate; adjusted
rates are necessary when these characteristics differ between the
populations. In medicine, rates are frequently adjusted for disparities
in age. Contingency tables display two nominal characteristics measured
on the same set of subjects. Bar charts are an effective way to illustrate
nominal data.

Acquired hemophilia is a rare, life-threatening disease caused
by development of autoantibodies directed against factor VIII. It
is often associated with an underlying disease. In Presenting Problem
4 Bossi and colleagues (1998) studied the characteristics and outcomes
of 34 patients who had this disease. The results from their study
help clinicians understand the presentation and clinical course
of acquired hemophilia. Treatments have included administration
of porcine factor VIII, immunosuppressive drugs, and intravenous
immuno globulins. These researchers point out the need for randomized,
controlled studies of treatment.

The study by Ballard and coworkers (1998) found that the antenatal
administration of thyrotropin-releasing hormone had no effect on
the pulmonary outcome in these premature infants. No significant
differences occurred between the treatment and placebo groups in
the incidence of respiratory distress syndrome, death, or chronic
lung disease. We used the data to illustrate the odds ratio for
the development of respiratory distress, and our results (not significant)
agreed with those of the authors. The investigators concluded that
treatment with thyrotropin-releasing hormone is not indicated for
women at risk of delivering a premature infant.

Exercises

1. Show that the sum of the deviations from the mean is equal
to 0. Demonstrate this fact by finding the sum of the deviations
for heart rate variation in Table 3–3.

2. In an effort to establish a normative value for heart rate
variation to deep breathing (RR_VAR), a noninvasive test
used to assess suspected cardiovascular autonomic dysfunction, Gelber
and associates (1997) evaluated heart rate variation data from 580
patients in 63 different locations over a period of 15 years. Using
the data set in a folder on the CD-ROM [available only with the book] entitled "Gelber" complete
the following:

a. Calculate the mean and standard deviation of heart
rate variation (RR_VAR).

b. Generate a frequency table of VAR for patients using the
following categories: 2–20, 21–30, 31–40,
41–50, 51–60, 61–70, 71–80,
81–90, and > 90.

c. Generate box plots of RR_VAR according to gender.

d. Normal limits of many laboratory values are set by the
2½ and 97½ percentiles, so that the normal limits contain the
central 95% of the distribution. This approach was taken
by Gelber and colleagues when they developed norms for mean heart
variation to breathing and the Valsalva ratio. Find the normal limits
for heart rate variation.

3. Again using the Gelber data set, generate a frequency table
of mean, median, minimum, and maximum heart rate variation for patients
in age categories. Use the age category column in the data set and
the Descriptives procedure in NCSS. There are 490 patients for whom
both age and heart rate variation are available. Your table should
look like Table 3–23. Estimate the overall mean age of
the patients. Estimate the overall mean variation to heart rate.
Compare each to the mean calculated with NCSS. Which is more accurate?
Why?

Table 3–23.
Frequency Table Showing Mean Heart Rate Variation to Deep Breathing
Broken Down by 10-Year Age Groups.

Table 3–23.
Frequency Table Showing Mean Heart Rate Variation to Deep Breathing
Broken Down by 10-Year Age Groups.

Heart Rate Variation

Age

Count

Mean

Median

Minimum

Maximum

11–20

52

63.0

57.6

24.0

150.9

21–30

162

57.6

56.1

13.3

124.4

31–40

144

51.1

48.3

14.0

128.9

41–50

78

39.6

37.6

9.4

105.5

51–60

20

34.0

32.3

7.2

85.7

61–70

23

29.1

23.5

3.3

70.9

> 70

11

16.8

17.3

2.1

28.3

Total

490

Source: Data, used with
permission of the author and publisher, from Gelber DA, Pfeifer
M, Dawson B, Shumer M: Cardiovascular autonomic nervous system tests:
Determination of normative values and effect of confounding variables.
J Auton Nerv Syst 1997;62:40–44. Table produced with NCSS;
used with permission.

4. Use the data from the "Bossi" file to form
a 2 x 2 contingency table for the frequencies
of hematuria in columns and whether patients had RBC units > 5 (gt5rbc
in the data file). After you have found the numbers in the cells,
use the NCSS program for two proportions (under Analysis and Other)
to find the odds ratio.

5. What is the most likely shape of the distribution of observations
in the following studies?

a. The age of subjects in a study of patients with Crohn's
disease.

b. The number of babies delivered by all physicians who delivered
babies in a large city during the past year.

c. The number of patients transferred to a tertiary care hospital
by other hospitals in the region.

6. Draw frequency polygons to compare men and women SMAF scores
on mental functioning at time 1 in the study by Hébert
and coworkers (1997). Repeat for time 2. What do you conclude?

7. The computational formula for the standard deviation is

Illustrate that the value of the standard deviation calculated
from this formula is equivalent to that found with the definitional
formula using shock index data in Table 3–3. From the section
titled, "The Standard Deviation," the value of
the standard deviation of shock index using the definitional formula
is 0.27. (Use the sums in Table 3–3 to save some calculations.)

8. The following questions give brief descriptions of some studies;
you may wish to refer to the articles for more information.

a. Khalakdina and colleagues (2003) recruited patients
with cryptosporidiosis and age-matched controls. Subjects in both
groups were interviewed by telephone to obtain information about
previous exposures. What statistic is best to summarize their findings?

b. Brown and coworkers (2003) studied group of 68 adults who
were allergic to ant stings; each subject was randomly assigned
to receive either venom immunotherapy or a placebo. After a sting challenge,
any reactions were recorded. What statistic is best to summarize
their findings?

c. Medical students were asked to assess their competency
in performing several cancer screening examinations in a study by
Lee and colleagues (2002). What statistic would be best to summarize the
average opinions by the students on each competency?

d. Grodstein and colleagues (2000) examined data from the
Nurses' Health Study to study the relationship between
duration of postmenopausal hormone therapy and the risk of coronary
heart disease in women. What statistics are best to describe the
distribution of duration of treatment in those who did and those
who did not subsequently experience coronary heart disease? What graphical
methods are appropriate?

e. Kreder and colleagues (2003) studied the effect of provider
volume on complication rates after total knee arthroplasty in patients.
Low provider volume was related to length of stay in hospital. What
graphical method is best to demonstrate the relationship between
pro vider volume and complication rate?

9. What measures of central tendency and dispersion are the most
appropriate to use with the following sets of data?

a. Salaries of 125 physicians in a clinic

b. The test scores of all medical students taking USLME Step
I of the National Board Examination in a given year

c. Serum sodium levels of healthy individuals

d. Number of tender joints in 30 joints evaluated on a standard
examination for disease activity in rheumatoid arthritis patients

e. Presence of diarrhea in a group of infants

f. The disease stages for a group of patients with Reye's
syndrome (six stages, ranging from 0 = alert wakefulness
to 5 = unarousable, flaccid paralysis, areflexia, pupils
unresponsive)

g. The age at onset of breast cancer in females

h. The number of pills left in subjects' medicine
bottles when investigators in a study counted the pills to evaluate
compliance in taking medication

10. Examine the pattern of distribution of mean heart rate variation
for different age groups in Table 3–23 (Gelber et al, 1997).
What do you observe? How would you learn whether or not your hunch
is correct?

11. The correlation between age and heart rate variation is–0.45
(Gelber et al, 1997). How do you interpret this value? What are
the implications for norms for heart rate variation?

12. Refer to Figure 3–2 to answer the following questions:

a. What is the mean weight of girls 24 months old?

b. What is the 90th percentile for head circumference for
12-month-old girls?

c. What is the fifth percentile in weight for 12-month-old
girls?

13. Find the coefficient of variation of mean change in red blood
cell units for men and for women using the data from Bossi and colleagues
(1998). Does one sex have greater relative variation in the number
of red blood cells?

14. Refer to the Physicians' Health Study (Steering
Committee of the Physicians' Health Study Research Group,
1989) in Table 3–19 to answer the following questions:

a. The authors used person-years of observation to calculate
the odds ratio. Calculate the relative risk using person-years of
observation and compare its value to the value we obtained. Give
some reasons for their similarity in magnitude. Under what circumstances
could they differ?

b. The authors also calculated the odds ratio adjusted for
age and use of beta-carotene. What do they mean by this statement?

c. How could the healthy volunteer effect contribute to the
finding of no difference in total mortality from cardiovascular
causes between the aspirin and placebo group?

15. From their own experiences in an urban public hospital, Kaku
and Lowenstein (1990) noted that stroke related to recreational
drug use was occurring more frequently in young people. To investigate
the problem, they identified all patients between 15 and 44 years
of age admitted to a given hospital and selected sex- and age-matched
controls from patients admitted to the hospital with acute medical
or surgical conditions for which recreational drug abuse has not
been shown to be a risk factor. Data are given in Table 3–24.
What is the odds ratio?

Table 3–24.
Data for Odds Ratio for Stroke with History of Drug Abuse.

Table 3–24.
Data for Odds Ratio for Stroke with History of Drug Abuse.

Stroke

Control

Drug Abuse

73

18

No Drug Abuse

141

196

Total

214

214

Source: Reproduced, with
permission, from Kaku DA, Lowenstein DH: Emergency of recreational
drug abuse as a major risk factor for stroke in young adults. Ann
Intern Med 1990:113:821–827.

16. Group Exercise. Obtain a copy
of the study by Moore and colleagues (1991) from your medical library,
and answer the following questions:

a. What was the purpose of this study?

b. What was the study design?

c. Why were two groups of patients used in the study?

d. Examine the box plots in the article's Figure
1. What conclusions are possible from the plots?

e. Examine the box plots in the article's Figure
2. What do these plots tell you about pH levels in normal healthy
men?

17. Group Exercise. It is important
that scales recommended to physicians for use in assessing risk
or making management decisions be shown to be reliable and valid.
Select an area of interest, and consult some journal articles that
describe scales or decision rules. Evaluate whether the authors
presented adequate evidence for the reproducibility and validity
of these scales. What kind of reproducibility was established? What
type of validity? Are these sufficient to warrant the use of the
scale? (For example, if you are interested in assessing surgical
risk for noncardiac surgery, you can consult the articles on an
index of cardiac risk by Goldman [1995] and Goldman
and associates [1977], as well as a follow-up
report of an index developed by Detsky and colleagues [1986].)