Note: Large images and tables on this page may necessitate printing in landscape mode.

Copyright © The McGraw-Hill Companies.  All rights reserved.

Basic and Clinical Biostatistics > Chapter 3. Summarizing Data & Presenting Data in Tables & Graphs >

Key Concepts

All observations of subjects in a study are evaluated on a scale of measurement that determines how the observations should be summarized, displayed, and analyzed.
Nominal scales are used to categorize discrete characteristics.
Ordinal scales categorize characteristics that have an inherent order.
Numerical scales measure the amount or quantity of something.
Means measure the middle of the distribution of a numerical characteristic.
Medians measure the middle of the distribution of an ordinal characteristic or a numerical characteristic that is skewed.
The standard deviation is a measure of the spread of observations around the mean and is used in many statistical procedures.
The coefficient of variation is a measure of relative spread that permits the comparison of observations measured on different scales.
Percentiles are useful to compare an individual observation with a norm.
Stem-and-leaf plots are a combination of frequency tables and histograms that are useful in exploring the distribution of a set of observations.
Frequency tables show the number of observations having a specific characteristic.
Histograms, box plots, and frequency polygons display distributions of numerical observations.
Proportions and percentages are used to summarize nominal and ordinal data.
Rates describe the number of events that occur in a given period.
Prevalence and incidence are two important measures of morbidity.
Rates must be adjusted when populations being compared differ in an important confounding factor.
The relationship between two numerical characteristics is described by the correlation.
The relationship between two nominal characteristics is described by the risk ratio, odds ratio, and event rates.
Number needed to treat is a useful indication of the effectiveness of a given therapy or procedure.
Scatterplots illustrate the relationship between two numerical characteristics.
Poorly designed graphs and tables mislead in the information they provide.
Computer programs are essential in today's research environment, and skills to use and interpret them can be very useful.

Presenting Problems

Presenting Problem 1

Pulmonary embolism (PE) is a leading cause of morbidity and mortality. Clinical features are nonspecific and a certain diagnosis is often difficult to make. Attempts to simplify and improve the diagnostic process in evaluating patients for possible PE have been made by the introduction of two components: determination of pretest probability and D-dimer testing. Pretest probability is determined by developing explicit criteria for determining the clinical probability of PE. D-dimer assays measure the formation of D-dimer when cross-linked fibrin in thrombi is broken down by plasmin. Elevated levels of D-dimer can be used to detect deep venous thrombosis (DVT) and PE. Some D-dimer tests are very sensitive for DVT and a normal result can be used to exclude venous thromboembolism.

Kline and colleagues (2002) wished to develop a set of clinical criteria that would define a subgroup of patients with a pretest probability of PE of greater than 40% (high-risk group). These patients would be at too great a risk of experiencing a PE to have the diagnosis excluded on the basis of D-dimer testing. However, patients with a lower pretest probability (low-risk group), in whom a normal result can help rule out the diagnosis of PE, might be suitable candidates for D-dimer testing. Data were available for 934 patients with suspected PE at seven urban emergency departments (ED) in the United States. The investigators measured a number of potential risk factors for PE, and we look at some basic attributes of the observations in this chapter. A random sample of shock index observations on 18 patients is given in the section titled, "Calculating Measures of Central Tendency" and the entire data set is in a folder on the CD-ROM [available only with the book] entitled "Kline."

Presenting Problem 2

The aging of the baby-boomers is leading to important demographic changes in the population, with significant implications for health care planners. Over the next 30 years in the United States, the proportion of people over the age of 75 years is expected to increase greatly. With the aging of the population, functional decline resulting in disability and morbidity is a major challenge to health care systems.

Hébert and coworkers (1997) designed a study to measure disability and functional changes over a 2-year period in a community-dwelling population age 75 years and older. A nurse interviewed 655 residents in their homes in Quebec, Canada. The Functional Autonomy Measurement System (SMAF), a 29-item rating scale measuring functional disability in five areas, was administered together with a questionnaire measuring health, cognitive function, and depression. Each individual was interviewed again 1 and 2 years later by the same nurse. The SMAF scale rates each item on a 4-point scale, where 0 is independent and 3 is dependent. Functional decline was defined by an increase of 5 points or more on the questionnaire, and improvement as a change within ± 4 points. (The final analysis included 572 subjects, 504 of whom completed both follow-up interviews and 68 of whom died during the study.)

The authors wanted to summarize the data and estimate declines in functional status. They also wanted to examine the relationship between changes in scores over the two 1-year periods. Data are given in the section titled "Displaying Numerical Data in Tables & Graphs" and on the CD-ROM [available only with the book] in a folder entitled "Hébert."

Presenting Problem 3

A large study in a primary care clinic found that one in 20 women had experienced domestic violence (DV) in the previous year and one in four had experienced it sometime during her adult life. Children from violent homes often suffer behavioral, emotional, and physical health consequences either because they themselves are abused or because they have witnessed the abuse of their mother. Pediatricians have a unique opportunity to screen mothers because DV-battered women may be more likely to obtain medical care for their children than for themselves.

Lapidus and colleagues (2002) at the Connecticut Children's Medical Center conducted a survey to assess DV education and training and the use of DV screening among a representative statewide sample of pediatricians and family physicians. They mailed self-administered surveys to 903 physicians identified as active members of the American Academy of Pediatrics and the American Academy of Family Physicians. The survey requested information on physician demographics and issues relating to DV. Domestic violence was defined as "past or current physical, sexual, emotional, or verbal harm to a woman caused by a spouse, partner, or family member." Overall, 49% of the physicians responded to the survey after a total of three mailings. The authors looked at the distribution of responses and calculated some measures of predictive factors. We will revisit this study in the chapter on survey research. In this chapter, we illustrate frequency tables and odds ratios calculated by these investigators. Data are on the CD-ROM [available only with the book] in a folder entitled "Lapidus."

Presenting Problem 4

Factor VIII is one of the procoagulants of the intrinsic pathway of coagulation. Hemophilia A, a disease affecting about 1 in 10,000 males, is a hereditary hemorrhagic disorder characterized by deficient or defective factor VIII. Acquired hemophilia is a much rarer hemorrhagic disorder af fecting 1 person per million each year and char acterized by spontaneous development of an autoantibody directed against factor VIII. Patients often present with ecchymosis, hematomas, hematuria, or compressive neuropathy. The hemorrhagic complications are fatal in 14–22% of patients. Underlying diseases, including autoimmune diseases and malignancies, are often associated with acquired hemophilia.

Optimal treatment is not yet established and, because the disease is so rare, no randomized controlled trials of treatment have been undertaken. A retrospective study of 34 patients with acquired hemophilia due to factor VIII inhibitors was conducted along with an extensive literature review to clarify the clinical characteristics of this disease and plan a prospective study of optimal treatment (Bossi et al, 1998). Information from the study is given in the section titled "Tables and Graphs for Nominal and Ordinal Data." The investigators want to summarize data on some risk factors for men and women separately.

Presenting Problem 5

Premature birth, especially after fewer than 32 weeks of gestation, is associated with a high incidence of respiratory distress syndrome and a form of chronic lung disease known as bronchopulmonary dysplasia. Lung disease is the principal cause of morbidity and mortality in premature infants.

Thyroid hormones stimulate fetal lung development in animals. Little thyroid hormone is transferred from mother to fetus, but thyrotropin-releasing hormone (TRH) given to the mother increases fetal serum concentrations of thyroid hormone. Several studies have shown that the antenatal administration of TRH reduces the incidence and severity of respiratory distress syndrome, chronic lung disease, and death in these high-risk infants. Two other studies showed no benefit from treatment with TRH.

Ballard and coinvestigators (1998) wanted to reassess the efficacy and safety of antenatal administration of TRH in improving pulmonary outcome in preterm infants. Most of the earlier studies were relatively small, and one had not been blinded. Also, changes in neonatal care implemented in the past decade, particularly the use of surfactant, improved the chances of survival of premature infants.

The study enrolled 996 women in active labor with gestations of at least 24 but fewer than 30 weeks into a randomized, double-blind, placebo-controlled trial of antenatalTRH. The women receiving active treatment were given four doses of 400 g of TRH intravenously at 8-h intervals. Those receiving placebo were given normal saline. Both groups received glucocorticoids, and surfactant was given to the infants when clinically indicated. There were 1134 live births (844 single and 290 multiple) and 11 stillbirths.

Infants born at 32 or fewer weeks gestation constituted the group at risk for lung disease; those born at 33 weeks or later were not at risk for lung disease. Outcomes included infant death on or before the 28th day after delivery; chronic lung disease, defined as the need for oxygen therapy for 21 of the first 28 days of life; and the development of respiratory stress syndrome, defined as the need for oxygen and either assisted ventilation or radiologic findings. The authors wanted to find the risk of developing these outcomes in the TRH group compared with the placebo group. Selected results from the study are given in the section titled "Number Needed to Treat."

Purpose of the Chapter

This chapter introduces different kinds of data collected in medical research and demonstrates how to organize and present summaries of the data. Regardless of the particular research being done, investigators collect observations and generally want to transform them into tables or graphs or to present summary numbers, such as percentages or means. From a statistical perspective, it does not matter whether the observations are on people, animals, inanimate objects, or events. What matters is the kind of observations and the scale on which they are measured. These features determine the statistics used to summarize the data, called descriptive statistics, and the types of tables or graphs that best display and communicate the observations.

We use the data from the presenting problems to illustrate the steps involved in calculating the statistics because we believe that seeing the steps helps most people understand procedures. As we emphasize throughout this book, however, we expect that most people will use a computer to analyze data. In fact, this and following chapters contain numerous illustrations from some commonly used statistical computer programs, including NCSS contained on the CD-ROM [available only with the book] .

Scales of Measurement

The scale for measuring a characteristic has implications for the way information is displayed and summarized. As we will see in later chapters, the scale of measurement—the precision with which a characteristic is measured—also determines the statistical methods for analyzing the data. The three scales of measurement that occur most often in medicine are nominal, ordinal, and numerical.

Nominal Scales

Nominal scales are used for the simplest level of measurement when data values fit into categories. For example, in Presenting Problem 5  Ballard and colleagues (1998) use the following nominal characteristic to describe the outcome in infants being treated with antenatal TRH: the development of respiratory distress syndrome. In this example, the observations are dichotomous or binary in that the outcome can take on only one of two values: yes or no. Although we talk about nominal data as being on the measurement scale, we do not actually measure nominal data; instead, we count the number of observations with or without the attribute of interest.

Many classifications in medical research are evaluated on a nominal scale. Outcomes of a medical treatment or surgical procedure, as well as the presence of possible risk factors, are often described as either occurring or not occurring. Outcomes may also be described with more than two categories, such as the classification of anemias as microcytic (including iron deficiency), macrocytic or megaloblastic (including vitamin B12 deficiency), and normocytic (often associated with chronic disease).

Data evaluated on a nominal scale are sometimes called qualitative observations, because they describe a quality of the person or thing studied, or categorical observations, because the values fit into categories. Nominal or qualitative data are generally described in terms of percentages or proportions, such as the fact that 38% of the patients in the study of patients with acquired hemophilia (Bossi et al, 1998) developed hematuria. Contingency tables and bar charts are most often used to display this type of information and are presented in the section titled "Tables and Graphs for Nominal and Ordinal Data."

Ordinal Scales

When an inherent order occurs among the categories, the observations are said to be measured on an ordinal scale. Observations are still classified, as with nominal scales, but some observations have more or are greater than other observations. Clinicians often use ordinal scales to determine a patient's amount of risk or the appropriate type of therapy. Tumors, for example, are staged according to their degree of development. The international classification for staging of carcinoma of the cervix is an ordinal scale from 0 to 4, in which stage 0 represents carcinoma in situ and stage 4 represents carcinoma extending beyond the pelvis or involving the mucosa of the bladder and rectum. The inherent order in this ordinal scale is, of course, that the prognosis for stage 4 is worse than that for stage 0.

Classifications based on the extent of disease are sometimes related to a patient's activity level. For example, rheumatoid arthritis is classified, according to the severity of disease, into four classes ranging from normal activity (class 1) to wheelchair-bound (class 4). Using the Functional Autonomy Measurement System developed by the World Health Organization, Hébert and coinvestigators (1997) studied the functional activity of elderly people who live in a community. Although order exists among categories in ordinal scales, the difference between two adjacent categories is not the same throughout the scale. To illustrate, Apgar scores, which describe the maturity of newborn infants, range from 0 to 10, with lower scores indicating depression of cardiorespiratory and neurologic functioning and higher scores indicating good functioning. The difference between scores of 8 and 9 probably does not have the same clinical implications as the difference between scores of 0 and 1.

Some scales consist of scores for multiple factors that are then added to get an overall index. An index frequently used to estimate the cardiac risk in noncardiac surgical procedures was developed by Goldman and his colleagues (1977, 1995). This index assigns points to a variety of risk factors, such as age over 70 years, history of an MI in the past 6 months, specific electrocardiogram abnormalities, and general physical status. The points are added to get an overall score from 0 to 53, which is used to indicate the risk of complications or death for different score levels.

A special type of ordered scale is a rank-order scale, in which observations are ranked from highest to lowest (or vice versa). For example, health providers could direct their education efforts aimed at the obstetric patient based on ranking the causes of low birthweight in infants, such as malnutrition, drug abuse, and inadequate prenatal care, from most common to least common. The duration of surgical procedures might be converted to a rank scale to obtain one measure of the difficulty of the procedure.

As with nominal scales, percentages and proportions are often used with ordinal scales. The entire set of data measured on an ordinal scale may be summarized by the median value, and we will describe how to find the median and what it means. Ordinal scales having a large number of values are sometimes treated as if they are numerical (see following section). The same types of tables and graphs used to display nominal data may also be used with ordinal data.

Numerical Scales

Observations for which the differences between numbers have meaning on a numerical scale are sometimes called quantitative observations because they measure the quantity of something. There are two types of numerical scales: continuousa (interval) and discrete scales. A continuous scale has values on a continuum (eg, age); a discrete scale has values equal to integers (eg, number of fractures).

aSome statisticians differentiate interval scales (with an arbitrary zero point) from ratio scales (with an absolute zero point); examples are temperature on a Celsius scale (interval) and temperature on a Kelvin scale (ratio). Little difference exists, however, in how measures on these two scales are treated statistically, so we call them both simply numerical.

If data need not be very precise, continuous data may be reported to the closest integer. Theoretically, however, more precise measurement is possible. Age is a continuous measure, and age recorded to the nearest year will generally suffice in studies of adults; however, for young children, age to the nearest month may be preferable. Other examples of continuous data include height, weight, length of time of survival, range of joint motion, and many laboratory values.

When a numerical observation can take on only integer values, the scale of measurement is discrete. For example, counts of things—number of pregnancies, number of previous operations, number of risk factors—are discrete measures.

In the study by Kline and colleagues (2002), several patient characteristics were evaluated, including shock index and presence of PE. The first characteristic is measured on a continuous numerical scale because it can take on any individual value in the possible range of values. Presence of PE has a nominal scale with only two values: presence or absence. In the study by Ballard and coworkers (1998), the number of infants who developed respiratory distress syndrome is an example of a discrete numerical scale.

Characteristics measured on a numerical scale are frequently displayed in a variety of tables and graphs. Means and standard deviations are generally used to summarize the values of numerical measures. We next examine ways to summarize and display numerical data and then return to the subject of ordinal and nominal data.

Summarizing Numerical Data with Numbers

When an investigator collects many observations, such as shock index or blood pressure in the study by Kline and colleagues (2002), numbers that summarize the data can communicate a lot of information.

Measures of the Middle

One of the most useful summary numbers is an indicator of the center of a distribution of observations—the middle or average value. The three measures of central tendency used in medicine and epidemiology are the mean, the median, and, to a lesser extent, the mode. All three are used for numerical data, and the median is used for ordinal data as well.

Calculating Measures of Central Tendency

The Mean

Although several means may be mathematically calculated, the arithmetic, or simple, mean is used most frequently in statistics and is the one generally referred to by the term "mean." The mean is the arithmetic average of the observations. It is symbolized by (called X-bar) and is calculated as follows: add the observations to obtain the sum and then divide by the number of observations.

The formula for the mean is written X / n, where (Greek letter sigma) means to add, X represents the individual observations, and n is the number of observations.

Table 3–1 gives the value of the shock index, systolic blood pressure, and heart rate for 18 randomly selected patients in the D-dimer study (Kline et al, 2002). (We will learn about random sampling in Chapter 4.) The mean shock index for these 18 patients is

Table 3–1. Shock Index for a Random Sample of 18 Patients.

Table 3–1. Shock Index for a Random Sample of 18 Patients.


Subject IDShock IndexSystolic Blood PressureHeart Rate
10.6113985
20.5615184
30.52201104
40.3317056
50.4512355
60.7412190
70.7311987
80.9210092
90.4216469
100.63161102
110.55164 
120.5013869
130.7511889
140.82130106
151.30109142
161.2992119
170.85126107
180.4413961

Source: Data, used with permission of the authors and publisher, Kline JA, Nelson RD, Jackson RE, Courtney DM: Criteria for the safe use of D-dimer testing in emergency department patients with suspected pulmonary embolism: A multicenter US study. Ann Emergency Med 2002;39:144–152. Table produced with NCSS; used with permission.

The mean is used when the numbers can be added (ie, when the characteristics are measured on a numerical scale); it should not ordinarily be used with ordinal data because of the arbitrary nature of an ordinal scale. The mean is sensitive to extreme values in a set of observations, especially when the sample size is fairly small. For example, the values of 1.30 for subject 15 and is relatively large compared with the others. If this value was not present, the mean would be 0.612 instead of 0.689.

If the original observations are not available, the mean can be estimated from a frequency table. A weighted average is formed by multiplying each data value by the number of observations that have that value, adding the products, and dividing the sum by the number of observations. We have formed a frequency table of shock index observations in Table 3–2, and we can use it to estimate the mean shock index for all patients in the study. The weighted-average estimate of the mean, using the number of subjects and the midpoints in each interval, is

Table 3–2. Frequency Distribution of Shock Index in 10-Point Intervals.

Table 3–2. Frequency Distribution of Shock Index in 10-Point Intervals.


Shock IndexCountCumulative CountPercentCumulative PercentGraph of Percent
0.40 or less38384.084.08/
0.40 up to 0.5010414211.1615.24////
0.50 up to 0.6019834021.2436.48////////
0.60 up to 0.7019953921.3557.83////////
0.70 up to 0.8015569416.6374.46//////
0.80 up to 0.9010279610.9485.41////
0.90 up to 1.00608566.4491.85//
1.00 up to 1.10378933.9795.82/
1.10 up to 1.20199122.0497.85/
1.20 or higher199322.15100.00/

Source: Data, used with permission of the authors and publisher, Kline JA, Nelson RD, Jackson RE, Courtney DM: Criteria for the safe use of D-dimer testing in emergency department patients with suspected pulmonary embolism: A multicenter US study. Ann Emergency Med 2002;39:144–152. Table produced with NCSS; used with permission.

The value of the mean calculated from a frequency table is not always the same as the value obtained with raw numbers. In this example, the shock index means calculated from the raw numbers and the frequency table are very close. Investigators who calculate the mean for presentation in a paper or talk have the original observations, of course, and should use the exact formula. The formula for use with a frequency table is helpful when we as readers of an article do not have access to the raw data but want an estimate of the mean.

The Median

The median is the middle observation, that is, the point at which half the observations are smaller and half are larger. The median is sometimes symbolized by M or Md, but it has no conventional symbol. The procedure for calculating the median is as follows:

1. Arrange the observations from smallest to largest (or vice versa).
2. Count in to find the middle value. The median is the middle value for an odd number of observations; it is defined as the mean of the two middle values for an even number of observations.

For example, in rank order (from lowest to highest), the shock index values in Table 3–1 are as follows:

0.33, 0.42, 0.44, 0.45, 0.50, 0.52, 0.55, 0.56, 0.61, 0.63, 0.73, 0.74, 0.75, 0.82, 0.85, 0.92, 1.29, 1.30

For 18 observations, the median is the mean of the ninth and tenth values (0.61 and 0.63), or 0.62. The median tells us that half the shock index values in this group are less than 0.62 and half are greater than 0.62. We will learn later in this chapter that the median is easy to determine from a stem-and-leaf plot of the observations.

The median is less sensitive to extreme values than is the mean. For example, if the largest observation, 1.30, is excluded from the sample, the median would be the middle value, 0.61. The median is also used with ordinal observations.

The Mode

The mode is the value that occurs most frequently. It is commonly used for a large number of observations when the researcher wants to designate the value that occurs most often. No single observation occurs most frequently among the data in Table 3–1. When a set of data has two modes, it is called bimodal. For frequency tables or a small number of observations, the mode is sometimes estimated by the modal class, which is the interval having the largest number of observations. For the shock index data in Table 3–2, the modal class is 0.60 thru 0.69 with 199 patients.

The Geometric Mean

Another measure of central tendency not used as often as the arithmetic mean or the median is the geometric mean, sometimes symbolized as GM or G. It is the nth root of the product of the n observations. In symbolic form, for n observations X1, X2, X3, . . . , Xn, the geometric mean is

The geometric mean is generally used with data measured on a logarithmic scale, such as the dilution of the smallpox vaccine studied by Frey and colleagues (2002), a presenting problem in Chapter 5. Taking the logarithm of both sides of the preceding equation, we see that the logarithm of the geometric mean is equal to the mean of the logarithms of the observations.

Use the CD-ROM [available only with the book] and find the mean, median, and mode for the shock index for all of the patients in the study by Kline and colleagues (2002). Repeat for patients who did and did not have a PE. Do you think the mean shock index is different for these two groups? In Chapter 6 we will learn how to answer this type of question.

Using Measures of Central Tendency

Which measure of central tendency is best with a particular set of observations? Two factors are important: the scale of measurement (ordinal or numerical) and the shape of the distribution of observations. Although distributions are discussed in more detail in Chapter 4, we consider here the notion of whether a distribution is symmetric about the mean or is skewed to the left or the right.

If outlying observations occur in only one direction—either a few small values or a few large ones—the distribution is said to be a skewed distribution. If the outlying values are small, the distribution is skewed to the left, or negatively skewed; if the outlying values are large, the distribution is skewed to the right, or positively skewed. A symmetric distribution has the same shape on both sides of the mean. Figure 3–1 gives examples of negatively skewed, positively skewed, and symmetric distributions.

The following facts help us as readers of articles know the shape of a distribution without actually seeing it.

1. If the mean and the median are equal, the distri bution of observations is symmetric, generally as in Figures 3–1C and 3–1D.
2. If the mean is larger than the median, the distribution is skewed to the right, as in Figure 3–1B.
3. If the mean is smaller than the median, the distribution is skewed to the left, as in Figure 3–1A.

In a study of the increase in educational debt among Canadian medical students, Kwong and colleagues (2002) reported the median level of debt for graduating students. The investigators reported the median rather than the mean because a relatively small number of students had extremely high debts, which would cause the mean to be an overestimate. The following guidelines help us decide which measure of central tendency is best.

1. The mean is used for numerical data and for symmetric (not skewed) distributions.
2. The median is used for ordinal data or for numerical data if the distribution is skewed.
3. The mode is used primarily for bimodal distributions.
4. The geometric mean is generally used for observations measured on a logarithmic scale.

Measures of Spread

Suppose all you know about the 18 randomly selected patients in Presenting Problem 1 is that the mean shock index is 0.69. Although the mean provides useful information, you have a better idea of the distribution of shock indices in these patients if you know something about the spread, or the variation, of the observations. Several statistics are used to describe the dispersion of data: range, standard deviation, coefficient of variation, percentile rank, and interquartile range. All are described in the following sections.

Calculating Measures of Spread

The Range

The range is the difference between the largest and the smallest observation. It is easy to determine once the data have been arranged in rank order. For example, the lowest shock index among the 18 patients is 0.33, and the highest is 1.30; thus, the range is 1.30 minus 0.33, or 0.97. Many authors give minimum and maximum values instead of the range, and in some ways these values are more useful.

The Standard Deviation

The standard deviation is the most commonly used measure of dispersion with medical and health data. Although its meaning and computation are somewhat complex, it is very important because it is used both to describe how observations cluster around the mean and in many statistical tests. Most of you will use a computer to determine the standard deviation, but we present the steps involved in its calculation to give a greater understanding of the meaning of this statistic.

The standard deviation is a measure of the spread of data about their mean. Briefly looking at the logic behind this statistic, we need a measure of the "average" spread of the observations about the mean. Why not find the deviation of each observation from the mean, add these deviations, and divide the sum by n to form an analogy to the mean itself? The problem is that the sum of deviations about the mean is always zero (see Exercise 1). Why not use the absolute values of the deviations? The absolute value of a number ignores the sign of the number and is denoted by vertical bars on each side of the number. For example, the absolute value of 5, |5|, is 5, and the absolute value of –5, |-5|, is also 5. Although this approach avoids the zero sum problem, it lacks some important statistical properties, and so is not used. Instead, the deviations are squared before adding them, and then the square root is found to express the standard deviation on the original scale of measurement. The standard deviation is symbolized as SD, sd, or simply s (in this text we use SD), and its formula is

The name of the statistic before the square root is taken is the variance, but the standard deviation is the statistic of primary interest.

Using n – 1 instead of n in the denominator produces a more accurate estimate of the true population standard deviation and has desirable mathematical properties for statistical inferences.

The preceding formula for standard deviation, called the definitional formula, is not the easiest one for calculations. Another formula, the computational formula, is generally used instead. Because we generally compute the standard deviation using a computer, the illustrations in this text use the more meaningful but computationally less efficient formula. If you are curious, the computational formula is given in Exercise 7.

Now let's try a calculation. The shock index values for the 18 patients are repeated in Table 3–3 along with the computations needed. The steps follow:

Table 3–3. Calculations for Standard Deviation of Shock Index in a Random Sample of 18 Patients.

Table 3–3. Calculations for Standard Deviation of Shock Index in a Random Sample of 18 Patients.


PatientX X – (X –)2
 
10.61–0.080.01
20.56–0.130.02
30.52–0.170.03
40.33–0.360.13
50.45–0.240.06
60.740.050.00
70.730.040.00
80.920.230.05
90.42–0.270.07
100.63–0.060.00
110.55–0.140.02
120.50–0.190.04
130.750.060.00
140.820.130.02
151.300.610.38
161.290.600.36
170.850.160.03
180.44–0.250.06
Sums12.41 1.28
Mean0.69  

Source: Data, used with permission of the authors and publisher, Kline JA, Nelson RD, Jackson RE, Courtney DM: Criteria for the safe use of D-dimer testing in emergency department patients with suspected pulmonary embolism: A multicenter US study. Ann Emergency Med 2002;39:144–152. Table produced with Microsoft Excel.

1. Let X be the shock index for each patient, and find the mean: the mean is 0.69, as we calculated earlier.
2. Subtract the mean from each observation to form the deviations X – mean.
3. Square each deviation to form (X – mean)2.
4. Add the squared deviations.
5. Divide the result in step 4 by n – 1; we have 0.071. This value is the variance.
6. Take the square root of the value in step 5 to find the standard deviation; we have 0.267 or 0.27. (The actual value is 0.275 or 0.28; our result is due to round-off error.)

But note the relatively large squared deviation of 0.38 for patient 15 in Table 3–3. It contributes substantially to the variation in the data. The standard deviation of the remaining 17 patients (after eliminating patient 15) is smaller, 0.235, demonstrating the effect that outlying observations can have on the value of the standard deviation.

The standard deviation, like the mean, requires numerical data. Also, like the mean, the standard deviation is a very important statistic. First, it is an essential part of many statistical tests as we will see in later chapters. Second, the standard deviation is very useful in describing the spread of the observations about the mean value. Two rules of thumb when using the standard deviation are:

1. Regardless of how the observations are distributed, at least 75% of the values always lie between these two numbers: the mean minus 2 standard deviations and the mean plus 2 standard deviations. In the shock index example, the mean is 0.69 and the standard deviation is 0.28; therefore, at least 75% lie between 0.69 ± 2(0.28), or between 0.13 and 1.25. In this example, 16 of the 18 observations, or 89%, fall between these limits.
2. If the distribution of observations is bell-shaped, then even more can be said about the percentage of observations that lay between the mean and ± 2 standard deviations. For a bell-shaped distribution, approximately:
67% of the observations lie between the mean ± 1 standard deviation
95% of the observations lie between the mean ± 2 standard deviations
99.7% of the observations lie between the mean ± 3 standard deviations

The standard deviation, along with the mean, can be helpful in determining skewness when only summary statistics are given: if the mean minus 2 SD contains zero (ie, the mean is smaller than 2 SD), the observations are probably skewed.

Use the CD-ROM [available only with the book], and find the range and standard deviation of shock index for all of the patients in the Kline and colleagues study (2002). Repeat for patients with and without a PE. Are the distributions of shock index similar in these two groups of patients?

The Coefficient of Variation

The coefficient of variation (CV) is a useful measure of relative spread in data and is used frequently in the biologic sciences. For example, suppose Kline and his colleagues (2002) wanted to compare the variability in shock index with the variability in systolic blood pressure (BP) in the patients in their study. The mean and the standard deviation of shock index in the total sample are 0.69 and 0.20, respectively; for systolic BP, they are 138 and 0.26, respectively. A comparison of the standard deviations makes no sense because shock index and systolic BP are measured on much different scales. The coefficient of variation adjusts the scales so that a sensible comparison can be made.

The coefficient of variation is defined as the standard deviation divided by the mean times 100%. It produces a measure of relative variation—variation that is relative to the size of the mean. The formula for the coefficient of variation is

From this formula, the CV for shock index is (0.20/0.69)(100%) = 29.0%, and the CV for systolic BP is (26/138)(100%) = 18.8%. We can therefore conclude that the relative variation in shock index is considerably greater than the variation in systolic BP. A frequent application of the coefficient of variation in the health field is in laboratory testing and quality control procedures.

Use the CD-ROM [available only with the book] and find the coefficient of variation for shock index for patients who did and did not have a PE in the Kline and colleagues study.

Percentiles

A percentile is the percentage of a distribution that is equal to or below a particular number. For example, consider the standard physical growth chart for girls from birth to 36 months old given in Figure 3–2. For girls 21 months of age, the 95th percentile of weight is 12 kg, as noted by the arrow in the chart. This percentile means that among 21-month-old girls, 95% weigh 12 kg or less and only 5% weigh more than 12 kg. The 50th percentile is, of course, the same value as the median; for 21-month-old girls, the median or 50th percentile weight is approximately 10.6 kg.

Percentiles are often used to compare an individual value with a norm. They are extensively used to develop and interpret physical growth charts and measurements of ability and intelligence. They also determine normal ranges of laboratory values; the "normal limits" of many laboratory values are set by the 2½ and 97½ percentiles, so that the normal limits contain the central 95% of the distribution. This approach was taken in a study by Gelber and colleagues (1997) when they developed norms for mean heart variation to breathing and Valsalva ratio (see Exercise 2).

Interquartile Range

A measure of variation that makes use of percentiles is the interquartile range, defined as the difference between the 25th and 75th percentiles, also called the first and third quartiles, respectively. The interquartile range contains the central 50% of observations. For example, the interquartile range of weights of girls who are 9 months of age (see Figure 3–2) is the difference between 7.5 kg (the 75th percentile) and 6.5 kg (the 25th percentile); that is, 50% of infant girls weigh between 6.5 and 7.5 kg at 9 months of age.

Using Different Measures of Dispersion

The following guidelines are useful in deciding which measure of dispersion is most appropriate for a given set of data.

   

1. The standard deviation is used when the mean is used (ie, with symmetric numerical data).

2. Percentiles and the interquartile range are used in two situations:

   

a. When the median is used (ie, with ordinal data or with skewed numerical data).

b. When the mean is used but the objective is to compare individual observations with a set of norms.

3. The interquartile range is used to describe the central 50% of a distribution, regardless of its shape.

4. The range is used with numerical data when the purpose is to emphasize extreme values.

5. The coefficient of variation is used when the intent is to compare distributions measured on different scales.

Displaying Numerical Data in Tables & Graphs

We all know the saying, "A picture is worth 1000 words," and researchers in the health field certainly make frequent use of graphic and pictorial displays of data. Numerical data may be presented in a variety of ways, and we will use the data from the study by Hébert and colleagues (1997) on functional decline in the elderly (Presenting Problem 2) to illustrate some of the more common methods. The subjects in this study were 75 years of age or older. We use a subset of their data, the 72 patients age 85 years or older who completed the Functional Autonomy Measurement System (SMAF). The total score on the SMAF for these subjects in year 1, year 3, and the differences in score between year 3 and year 1 in are given in Table 3–4.

Table 3–4. Difference in Total Score on the Functional Autonomy Measurement System for Patients Age 85 Years or Older. Positive Differences Indicate a Decline.

Table 3–4. Difference in Total Score on the Functional Autonomy Measurement System for Patients Age 85 Years or Older. Positive Differences Indicate a Decline.


AgeSexSMAF at Time 1SMAF at Time 3Difference (Time 3 – Time 1)
90F2820–8
88F8113
88F693
90F2218–4
88M671
86F990
86M2315–8
85F124028
88F93021
86F51510
95F2016–4
88F32623
87F22242
86F20200
86M011
93F30344
87F132310
94F47525
86F12019
85F35047
87M45753
89F12142
87F143
87F13163
85F110
85F3530–5
88F2219–3
88M110
86F21715
88M330
86F213918
85F220
85M781
88M8102
85F75–2
89F11209
87F10–1
88F12197
87F195637
94F2116–5
86M17269
85F2721–6
85M42–2
85F95–4
85M73427
87F3834–4
85F13229
85F440
85F172710
90F23274
86M12131
88M3029–1
85M2726–1
87F264721
86M44462
85F21232
86M175740
88M10199
85F15227
86F462
88F10122
88M18224
87M12208
85M374710
85F1714–3
89F14195
85F11143
87F462
86F162610
90F561
85F48513
88M9178

Source: Data, used with permission of the author and the publisher, from Hébert R, Brayne C, Spiegelhalter D: Incidence of functional decline and improvement in a community-dwelling very elderly population. Am J Epidemiol 1997;145:935–944.

Stem-and-Leaf Plots

Stem-and-leaf plots are graphs developed in 1977 by Tukey, a statistician interested in meaningful ways to communicate by visual display. They provide a convenient means of tallying the observations and can be used as a direct display of data or as a preliminary step in constructing a frequency table. The observations in Table 3–4 show that many of the differences in total scores are small, but also that some people have large positive scores, indicating large declines in function. The data are not easy to understand, however, by simply looking at a list of the raw numbers. The first step in organizing data for a stem-and-leaf plot is to decide on the number of subdivisions, called classes or intervals (it should generally be between 6 and 14; more details on this decision are given in the following section). Initially, we categorize observations by 5s, from –9 to –5, –4 to 0, 1 to 5, 6 to 10, 11 to 15, 16 to 20, and so on.

To form a stem-and-leaf plot, draw a vertical line, and place the first digits of each class—called the stem—on the left side of the line, as in Table 3–5. The numbers on the right side of the vertical line represent the second digit of each observation; they are the leaves. The steps in building a stem-and-leaf plot are as follows:

Table 3–5. Constructing a Stem-and-Leaf Plot of Change in Total Function Scores Using 5-Point Categories: Observations for the First 10 Subjects.

Table 3–5. Constructing a Stem-and-Leaf Plot of Change in Total Function Scores Using 5-Point Categories: Observations for the First 10 Subjects.


StemLeaves
–9 to –58 8
–4 to 04 0
+1 to +53 3 1
+6 to +100
+11 to +15 
+16 to +20 
+21 to +251
+26 to +308
+31 to +35 
+36 to +40 
+41 to +45 
+46 to +50 
+51 to +55 
+56 to +60 

Source: Data, used with permission of the authors and the publisher, from Hébert R, Brayne C, Spiegelhalter D: Incidence of functional decline and improvement in a community-dwelling, very elderly population. Am J Epidemiol 1997;145:935–944.

1. Take the score of the first person, –8, and write the second digit, 8, or leaf, on the right side of the vertical line, opposite the first digit, or stem, corresponding to –9 to –5.
2. For the second person, write the 3 (leaf) on the right side of the vertical line opposite 1 to 5 (stem).
3. For the third person, write the 3 (leaf) opposite 1 to 5 (stem) next to the previous score of 3.
4. For the fourth person, write the –4 (leaf) opposite –4 to 0 (stem); and so on.
5. When the observation is only one digit, such as for subjects 1 through 7 in Table 3–4, that digit is the leaf.
6. When the observation is two digits, however, such as the score of 28 for subject 8, only the second digit, or 8 in this case, is written.

The leaves for the first ten people are given in Table 3–5. The complete stem-and-leaf plot for the score changes of all the subjects is given in Table 3–6. The plot both provides a tally of observations and shows how the changes in scores are distributed. The choice of class widths of 5 points is reasonable, although we usually prefer to avoid having many empty classes at the high end of the scale. It is generally preferred to have equal class widths and to avoid open-ended intervals, such as 30 or higher, although some might choose to combine the higher classes in the final plot.

Table 3–6. Stem-and-Leaf Plot of Change in Total Function Scores Using 5-Point Categories.

Table 3–6. Stem-and-Leaf Plot of Change in Total Function Scores Using 5-Point Categories.


StemLeaves
–9 to –58 8 5 5 6
–4 to 04 0 4 0 0 3 0 0 0 2 1 2 4 4 0 1 1 3
+1 to +53 3 1 2 1 4 5 2 3 3 1 2 4 1 2 2 2 2 4 5 3 2 1 3
+6 to +100 0 9 7 9 9 0 9 7 8 0 8 0
+11 to +155
+16 to +209 8
+21 to +251 3 1
+26 to +308 7
+31 to +35 
+36 to +407 0
+41 to +457
+46 to +50 
+51 to +553
+56 to +60 

Source: Data, used with permission of the authors and the publisher, from Hébert R, Brayne C, Spiegelhalter D: Incidence of functional decline and improvement in a community-dwelling, very elderly population. Am J Epidemiol 1997;145:935–944.

Usually the leaves are reordered from lowest to highest within each class. After the reordering, it is easy to locate the median of the distribution by simply counting in from either end.

Use the CD-ROM [available only with the book] and the routine for generating stem-and-leaf plots with the data on shock index separately for patients who did and did not have a PE in the Kline and colleagues study (1997).

Frequency Tables

Scientific journals often present information in frequency distributions or frequency tables. The scale of the observations must first be divided into classes, as in stem-and-leaf plots. The number of observations in each class is then counted. The steps for constructing a frequency table are as follows:

   

1. Identify the largest and smallest observations.

2. Subtract the smallest observation from the largest to obtain the range.

3. Determine the number of classes. Common sense is usually adequate for making this decision, but the following guidelines may be helpful.

   

a. Between 6 and 14 classes is generally adequate to provide enough information without being overly detailed.

b. The number of classes should be large enough to demonstrate the shape of the distribution but not so many that minor fluctuations are noticeable.

4. One approach is to divide the range of observations by the number of classes to obtain the width of the classes. For some applications, deciding on the class width first may make more sense; then use the class width to determine the number of classes. The following are some guidelines for determining class width.

   

a. Class limits (beginning and ending numbers) must not overlap. For example, they must be stated as "40–49" or "40 up to 50," not as "40–50" or "50–60." Otherwise, we cannot tell the class to which an observation of 50 belongs.

b. If possible, class widths should be equal. Unequal class widths present graphing problems and should be used only when large gaps occur in the data.

c. If possible, open-ended classes at the upper or lower end of the range should be avoided because they do not accurately communicate the range of the observations. We used open-ended classes in Table 3–2 when we had the categories of 0.40 or less and 1.20 or higher.

d. If possible, class limits should be chosen so that most of the observations in the class are closer to the midpoint of the class than to either end of the class. Doing so results in a better estimate of the raw data mean when the weighted mean is calculated from a frequency table (see the section titled, "The Mean" and Exercise 3).

5. Tally the number of observations in each class. If you are constructing a stem-and-leaf plot, the actual value of the observation is noted. If you are constructing a frequency table, you need use only the number of observations that fall within the class.

Computer programs generally list each value, along with its frequency. Users of the programs must designate the class limits if they want to form frequency tables for values in specific intervals, such as in Table 3–2, by recoding the original observations.

Some tables present only frequencies (number of patients or subjects); others present percentages as well. Percentages are found by dividing the number of observations in a given class, ni, by the total number of observations, n, and then multiplying by 100. For example, for the shock index class from 0.40 up to 0.50 in Table 3–2, the percentage is

For some applications, cumulative frequencies, or percentages, are desirable. The cumulative frequency is the percentage of observations for a given value plus that for all lower values. The cumulative value in the last column of Table 3–2, for instance, shows that almost 75% of patients had a shock index less that 0.80.

Histograms, Box Plots, & Frequency Polygons

Graphs are used extensively in medicine—in journals, in presentations at professional meetings, and in advertising literature. Graphic devices especially useful in medicine are histograms, box plots, error plots, line graphs, and scatterplots.

Histograms

A histogram of the changes in scores for elderly subjects in the Hébert and coworkers' (1997) study of functional stability is shown in Figure 3–3. Histograms usually present the measure of interest along the X-axis and the number or percentage of observations along the Y-axis. Whether numbers or percentages are used depends on the purpose of the histogram. For example, percentages are needed when two histograms based on different numbers of subjects are compared.

You may notice that the numbers of patients in each class are different from the numbers we found when creating the stem-and-leaf plots. This incongruity occurs because most statistical computer programs determine the class limits automatically. As with frequency tables, it is possible to recode the numbers into a new measure if you want to specify the class limits.

Note that the area of each bar is in proportion to the percentage of observations in that interval; for example, the nine observations in the –5 interval (values between –7.5 and –2.5) account for 9/72, or 12.5%, of the area covered by this histogram. A histogram therefore communicates information about area, one reason the width of classes should be equal; otherwise the heights of columns in the histogram must be appropriately modified to maintain the correct area. For example, in Figure 3–3, if the lowest class were 10 score points wide (from –12.5 to –2.5) and all other classes remained 5 score points wide, 11 observations would fall in the interval. The height of the column for that interval should then be only 5.5 units (instead of 11 units) to compensate for its doubled width.

Box Plots

A box plot, sometimes called a box-and-whisker plot by Tukey (1977) , is another way to display information when the objective is to illustrate certain locations in the distribution. It can be constructed from the information in a stem-and-leaf plot or a frequency table. A stem-and-leaf plot for patients 85 years of age or older is given in Table 3–7. The median and the first and third quartiles of the distribution are used in constructing box plots. Computer programs do not routinely denote the mean and 75th and 25th quartiles with stem-and-leaf plots, but it is easy to request this information, as illustrated in Table 3–7. The median change in SMAF score is 2.5, the 75th percentile is 9, and the 25th percentile is 0.

Table 3–7. Descriptive Information and Stem-and-Leaf Plot of Smaf Score Changes for Subjects 85 Years Old or Older.

Table 3–7. Descriptive Information and Stem-and-Leaf Plot of Smaf Score Changes for Subjects 85 Years Old or Older.


Quartile Section of SMAF Score Changes
Parameter10th Percentile25th Percentile50th Percentile75th Percentile90th Percentile
Value–402.5922.4
95% LCL–6–31510
95% UCL–2141840

Stem-and-Leaf Section of SMAF Score Changes
Depth Stem  Leaves
288
3S6
9F554444
13T3322
19–0*111000
280*00001111
(14)T22222222333333
30F44455
25S77
23889999
171*00000
12T 
12F5
11S 
1189
92*11
High 23, 27, 28, 37, 40, 47, 53
Unit = 1 Example: 1 | 2 Represents 12

Source: Data, used with permission of the authors and the publisher, from Hébert R, Brayne C, Spiegelhalter D: Incidence of functional decline and improvement in a community-dwelling, very elderly population. Am J Epidemiol 1997;145:935–944. Plot produced with NCSS; used with permission.

Note that the values for the stem are different from the ones we used because most computer programs determine the stem values. In Table 3–7, there is a stem value for every two values of change in SMAF score. Although it is not very intuitive, the stem values represent class sizes of 2 and the following symbols represent: l for numbers ending in 0 and 1; T for 2 and 3; F for 4 and 5; S for 6 and 7; and * for 8 and 9.

A box plot of the changes in SMAF scores for patients 85 years or older is given in Figure 3–4.b A box is drawn with the top at the third quartile and the bottom at the first quartile; quartiles are sometimes referred to as hinges in box plots. The length of the box is a visual representation of the interquartile range, representing the middle 50% of the data. The width of the box is chosen to be pleasing esthetically. The location of the midpoint or median of the distribution is indicated with a horizontal line in the box. Finally, straight lines, or whiskers, extend 1.5 times the interquartile range above and below the 75th and 25th percentiles. Any values above or below the whiskers are called outliers.

bFor this analysis, we selected the following patients: 85 years, score on the total SMAF at time 3 –1.

Box plots communicate a great deal of information; for example, we can easily see from Figure 3–4 that the score changes range from about –10 to about 55 (actually, from –8 to 53). Half of the score changes were between about 0 and 8, and the median is a little larger than 0. There are seven outlying values; four patients had score changes greater than 35 points.

Use the CD-ROM [available only with the book] to generate box plots for shock index separately for patients with and without a PE in the Kline and colleagues study (2002). Do these graphs enhance your understanding of the distributions?

Frequency Polygons

Frequency polygons are line graphs similar to histograms and are especially useful when comparing two distributions on the same graph. As a first step in constructing a frequency polygon, a stem-and-leaf plot or frequency table is generated. Table 3–8 contains the frequencies shock index for patients who did and did not have a PE.

Table 3–8. Frequency Table for Shock Index.

Table 3–8. Frequency Table for Shock Index.


A. Shock Index in Patients Not Having a Pulmonary Embolism
CategoryCountCumulative CountPercentCumulative PercentGraph of Percent
0.40 or less33334.394.39/
0.40 thru 0.499412712.5216.91/////
0.50 thru 0.5916028721.3038.22////////
0.60 thru 0.6916144821.4459.65////////
0.70 thru 0.7912056815.9875.63//////
0.80 thru 0.898965711.8587.48////
0.90 thru 0.99396965.1992.68//
1.00 thru 1.09287243.7396.40/
1.10 thru 1.19157392.0098.40/
1.20 or higher127511.60100.00/
B. Shock Index in Patients Having a Pulmonary Embolism
CategoryCountCumulative CountPercentCumulative PercentGraph of Percent
0.40 or less552.762.76/
0.40 thru 0.4910155.528.29///
0.50 thru 0.59385320.9929.28////////
0.60 thru 0.69389120.9950.28////////
0.70 thru 0.793512619.3469.61///////
0.80 thru 0.89131397.1876.80///
0.90 thru 0.992116011.6088.40////
1.00 thru 1.0991694.9793.37/
1.10 thru 1.1941732.2195.58/
1.20 or higher81814.42100.00/

Source: Data, used with permission of the authors and publisher, Kline JA, Nelson RD, Jackson RE, Courtney DM: Criteria for the safe use of D-dimer testing in emergency department patients with suspected pulmonary embolism: A multicenter US study. Ann Emergency Med 2002;39:144–152. Table produced with NCSS; used with permission.

Figure 3–5 is a histogram based on the frequencies for patients who had a PE with a frequency polygon superimposed on it. It demonstrates that frequency polygons are constructed by connecting the midpoints of the columns of a histogram. Therefore, the same guidelines hold for constructing frequency polygons as for constructing frequency tables and histograms. Note that the line extends from the midpoint of the first and last columns to the X-axis in order to close up both ends of the distribution and indicate zero frequency of any values beyond the extremes. Because frequency polygons are based on a histogram, they also portray area.

Graphs Comparing Two or More Groups

Merely looking at the numbers in Table 3–8 is insufficient for deciding if the distributions of shock index are similar for patients with and without a PE are similar. Several methods are useful for comparing distributions.

Box plots are very effective when there is more than one group and are shown for shock index among patients with and without a PE in Figure 3–6. The distributions of the shock index are similar, although more variability exists in patients without a PE, and the median index is slightly higher in those who did have a PE. Does a difference exist between the two groups; we will have to wait until Chapter 6 to learn the answer.


Percentage polygons are also useful for comparing two frequency distributions. Percentage polygons for shock index in both patients with and without PE are illustrated in Figure 3–7. Frequencies must be converted to percentages when the groups being compared have unequal numbers of observations, and this conversion has been made for Figure 3–7. It appears that the distribution of shock index does not appear to be very different for the two patient groups; most of the area in one polygon is overlapped by that in the other. Thus, the visual message of box plots and frequency polygons is consistent.


Another type of graph often used in the medical literature is an error bar plot. Figure 3–8 contains error bars for patients with and without a PE. The circle designates the mean, and the bars illustrate the standard deviation, although some authors use the mean and standard error (a value smaller than the standard deviation, discussed in Chapter 4). We recommend using standard deviations and discuss this issue further in Chapter 4. The error bars indicate the similarity of the distributions, just as the percentage polygons and the box plots do.

Look at Figures 3–6, 3–7, 3–8 and decide which you think provides the most useful information.

Summarizing Nominal & Ordinal Data with Numbers

When observations are measured on a nominal, or categorical, scale, the methods just discussed are not appropriate. Characteristics measured on a nominal scale do not have numerical values but are counts or frequencies of occurrence. The study on domestic violence examined a number of characteristics about physicians, including previous training in domestic violence, to help explain differences in screening behavior (Lapidus et al, 2002). Both screening and previous training are dichotomous, or binary, meaning that only two categories are possible. In this section, we examine measures that can be used with such observations.

Ways to Describe Nominal Data

Nominal data can be measured using several methods: proportions, percentages, ratios, and rates. To illustrate these measures, we will use the numbers of physicians who screened patients for domestic violence based on whether they had previous training; the data are given in Table 3–9.

Table 3–9. Physician Screening Prevalence by Demographic, Practice, and Domestic Violence Training Characteristics for 438 Respondents.

Table 3–9. Physician Screening Prevalence by Demographic, Practice, and Domestic Violence Training Characteristics for 438 Respondents.


 Screen 
 YesNoTotal
Location of practice   
  Urban9728125
  Suburban19999298
  Rural341145
Type of practice   
  Private261110371
  Other401252
Teaching residents   
  Yes15462216
  No17576251
Previous DV training   
  Yes17527202
  No155111266

Source: Data, used with permission of the authors and publisher, Lapidus G, Cooke MB, Gelven E, Sherman K, Duncan M, Bancol L: A statewide survey of domestic violence screening behaviors among pediatricians and family physicians. Arch Pediatr Adolesc Med 2002;156:332–336. Table produced with Microsoft Excel.

Proportions and Percentages

A proportion is the number, a, of observations with a given characteristic (such as those who screened for domestic violence) divided by the total number of observations, a + b, in a given group (such as those who had previous training). That is,

A proportion is always defined as a part divided by the whole and is useful for ordinal and numerical data as well as nominal data, especially when the observations have been placed in a frequency table. In the domestic violence study, the proportion of physicians trained in domestic violence who subsequently screened patients is 175/202 = 0.866, and the proportion without training who subsequently screened patients is 155/266 = 0.583.

A percentage is simply the proportion multiplied by 100%.

Ratios and Rates

A ratio is the number of observations in a group with a given characteristic divided by the number of observations without the given characteristic:

A ratio is always defined as a part divided by another part. For example, among physicians who trained, the ratio of those who screened patients to those who did not is 175/27 = 6.481. Other familiar ratios in medicine include ratios of the three components of cholesterol (HDL, LDL, triglycerides), such as the LDL/ HDL ratio.

Rates are similar to proportions except that a multiplier (eg, 1000, 10,000, or 100,000) is used, and they are computed over a specified period of time. The multiplier is called the base, and the formula is

For example, if a study lasted exactly 1 year and the proportion of patients with a given condition was 0.002, the rate per 10,000 patients would be (0.002) x (10,000), or 20 per 10,000 patients per year.

Vital Statistics Rates

Rates are very important in epidemiology and evidence-based medicine; they are the basis of the calculation of vital statistics, which describe the health status of populations. Some of the most commonly used rates are briefly defined in the following sections.

Mortality Rates

Mortality rates provide a standard way to compare numbers of deaths occurring in different populations, deaths due to different diseases in the same population, or deaths at different periods of time. The numerator in a mortality rate is the number of people who died during a given period of time, and the denominator is the number of people who were at risk of dying during the same period. Because the denominator is often difficult to obtain, the number of people alive in the population halfway through the time period is frequently used as an estimate. Table 3–10 gives death data from Vital Statistics of the United States.

Table 3–10. Number of Deaths, Death Rates, and Age-Adjusted Death Rates, by Race and Sex: United States 1987–1996a

Table 3–10. Number of Deaths, Death Rates, and Age-Adjusted Death Rates, by Race and Sex: United States 1987–1996a


Number of Deaths in the United States: 1987–1996
 All RacesWhiteBlack
YearBoth SexesMaleFemaleBoth SexesMaleFemaleBoth SexesMaleFemale
19962,314,6901,163,5691,151,1211,992,966991,9841,000,982282,089149,472132,617
19952,312,1321,172,9591,139,1731,987,437997,277990,160286,401154,175132,226
19942,278,9941,162,7471,116,2471,959,875988,823971,052282,379153,019129,360
19932,268,5531,161,7971,106,7561,951,437988,329963,108282,151153,502128,649
19922,175,6131,122,3361,053,2771,873,781956,957916,824269,219146,630122,589
19912,169,5181,121,6651,047,8531,868,904956,497912,407269,525147,331122,194
19902,148,4631,113,4171,035,0461,853,254950,812902,442265,498145,359120,139
19892,150,4661,114,1901,036,2761,853,841950,852902,989267,642146,393121,249
19882,167,9991,125,5401,042,4591,876,906965,419911,487264,019144,228119,791
19872,123,3231,107,9581,015,3651,843,067953,382889,685254,814139,551115,263
Death Rates in the United States per 100,000: 1987–1996 
1996872.5896.4849.7906.9918.1896.2842.0939.9753.5
1995880.0914.1847.3911.3932.1891.3864.2980.7759.0
1994875.4915.0837.6905.4931.6880.1864.3987.8752.9
1993880.0923.5838.6908.5938.8879.4876.81,006.3760.1
1992852.9901.6806.5880.0917.2844.3850.5977.5736.2
1991860.3912.1811.0886.2926.2847.7864.9998.7744.5
1990863.8918.4812.0888.0930.9846.9871.01,008.0747.9
1989871.3926.3818.9893.2936.5851.8887.91,026.7763.2
1988886.7945.1831.2910.5957.9865.3888.31,026.1764.6
1987876.4939.3816.7900.1952.7849.8868.91,006.2745.7
Age-Adjusted Death Rates in the United States per 100,000: 1987–1996 
1996491.6623.7381.0466.8591.4361.9738.3967.0561.0
1995503.9646.3385.2476.9610.5364.9765.71,016.7571.0
1994507.4654.6385.2479.8617.9364.9772.11,029.9572.0
1993513.3664.9388.3485.1627.5367.7785.21,052.2578.8
1992504.5656.0380.3477.5620.9359.9767.51,026.9568.4
1991513.7669.9386.5486.8634.4366.3780.71,048.9575.1
1990520.2680.2390.6492.8644.3369.9789.21,061.3581.6
1989528.0689.3397.3499.6652.2376.0805.91,082.8594.3
1988539.9706.1406.1512.8671.3385.3809.71,083.0601.0
1987539.2706.8404.6513.7674.2384.8796.41,063.6592.4

aCrude rates on an annual basis per 100,000 population in specified group; age-adjusted rates per 100,000 U.S. standard million population. Rates are based on populations enumerated as of April 1 for census years and estimated as of July 1 for all other years. Excludes deaths of nonresidents of the United States.

Source: Adapted, with permission, from Peters KD, Kochanek KD, Murphy SL:Deaths: Final data for 1996. National Vital Statistics Report: Vol. 47, no. 9, p. 16. National Center for Health Statistics, 1998.

A crude rate is a rate computed over all individuals in a given population. For example, the crude annual mortality rate in the entire population from Table 3–10 is 872.5 per 100,000 in 1996. The sex-specific mortality rate for males is 896.4 during that same year, and for females it is 849.7 per 100,000. Comparing the sex-specific mortality rates across the years given in Table 3–10, the mortality rate appears to have increased for women. Does this make sense, or could there be another explanation? Consider that a larger number of older women may have been living in 1996 than in previous years. This hypothesis can be examined by adjusting the mortality rates for the age of people at risk. When age-adjusted rates are examined in Table 3–10, we see that the rates have been declining as we would expect. We talk more about adjusting rates in the section of that title.

Cause-specific mortality rates measure deaths in a population from a specific disease or adverse event. Comparing cause-specific mortality rates over a period of time helps epidemiologists to determine possible predisposing factors in the development of disease as well as to make projections about future trends.

Other commonly used mortality rates are infant mortality rate and case fatality rate. The infant mortality rate, sometimes used as an indicator of the level of general health care in a population, is the number of infants who die before 1 year of age per 1000 live births. The case fatality rate is the number of deaths from a specific disease occurring in a given period divided by the number of individuals with the specified disease during the same period.

Morbidity Rates

Morbidity rates are similar to mortality rates, but many epidemiologists think they provide a more direct measure of health status in a population. The morbidity rate is the number of individuals who develop a disease in a given period of time divided by the number of people in the population at risk.

Prevalence and incidence are two important measures frequently used in medicine and epidemiology. Prevalence is defined as the number of individuals with a given disease at a given point in time divided by the population at risk for that disease at that time. Incidence is defined as the number of new cases that have occurred during a given interval of time divided by the population at risk at the beginning of the time interval. (Because prevalence does not involve a period of time, it is actually a proportion, but is often mistakenly termed a rate.) The term "incidence" is sometimes used erroneously when the term "prevalence" is meant. One way to distinguish between them is to look for units: An incidence rate should always be expressed in terms of a unit of time.

We can draw an analogy between prevalence and incidence and two of the study designs discussed in Chapter 2. Prevalence is like a snapshot in time, as is a cross-sectional study. In fact, some cross-sectional studies are called prevalence studies by epidemiologists. Incidence, on the other hand, requires a period of time to transpire, similar to cohort studies. Recall that cohort studies begin at a given time and continue to examine outcomes over the specified length of the study.

Epidemiologists use prevalence and incidence rates to evaluate disease patterns and make future projections. For example, diabetes mellitus has an increasing prevalence even though the annual incidence rate of approximately 230 cases per 100,000 has remained relatively stable over the past several years. The reason for the difference is that once this disease occurs, an individual continues to have diabetes the remainder of his or her life; but advances in care of diabetic patients have led to greater longevity for these patients. In contrast, for diseases with a short duration (eg, influenza) or with an early mortality (eg, pancreatic cancer), the incidence rate is generally larger than the prevalence.

Adjusting Rates

We can use crude rates to make comparisons between two different populations only if the populations are similar in all characteristics that might affect the rate. For example, if the populations are different or confounded by factors such as age, gender, or race, then age-, gender-, or race-specific rates must be used, or the crude rates must be adjusted; otherwise, comparisons will not be valid.

Rates in medicine are commonly adjusted for age. Often, two populations of interest have different age distributions; yet many characteristics studied in medicine are affected by age, becoming either more or less frequent as individuals grow older. If the two populations are to be compared, the rates must be adjusted to reflect what they would be had their age distributions been similar.

Direct Method of Adjusting Rates

As an illustration, suppose a researcher compares the infant mortality rates from a developed country with those from a developing country and concludes that the mortality rate in the developing country is almost twice as high as the rate in the developed country. Is this conclusion misleading; are confounding factors affecting infant mortality that might contribute to different distributions in the two countries? A relationship between birthweight and mortality certainly exists, and in this example, a valid comparison of mortality rates requires that the distribution of birthweight be similar in the two countries. Hypothetical data are given in Table 3–11.

The crude infant mortality rate for the developed country is 12.0 per 1000 infants; for the developing country, it is 23.9 per 1000. The specific rates for the developing country are higher in all birthweight categories. However, the two distributions of birthweight are not the same: The percentage of low-birthweight infants (< 2500 g) is more than twice as high in the developing country as in the developed country. Because birthweight of infants and infant mortality are related, we cannot determine how much of the difference in crude mortality rates between the countries is due to differences in weight-specific mortality and how much is due to the developing country's higher proportion of low-birthweight babies. In this case, the mortality rates must be standardized or adjusted so that they are independent of the distribution of birthweight.c

cOf course factors other than birthweight may affect mortality, and it is important to remember that correcting for one factor may not correct for others.

Determining an adjusted rate is a relatively simple process when information such as that in Table 3–11 is available. For each population, we must know the specific rates. Note that the crude rate in each country is actually a weighted average of the specific rates, with the number of infants born in each birthweight category used as the weights. For example, the crude mortality rate in the developed country is 2400/200,000 = 0.012, or 12 per 1000, and is equal to

Table 3–11. Infant Mortality Rate Adjustment: Direct Method.

Table 3–11. Infant Mortality Rate Adjustment: Direct Method.


 Developed CountryDeveloping Country
 Infants BornDeathsInfants BornDeaths
BirthweightN (in 1000s) %No.RateN (in 1000s) %No.Rate
< 1500 g201087043.53021186062.0
1500–2499 g301548016.0453290020.0
2500 g1507510507.065475859.0
Total200 240012.0140 334523.9

Because the goal of adjusting rates is to have them reflect similar distributions, the numbers in each category from one population, called the reference population, are used as the weights to form weighted averages for both populations. Which population is chosen as the standard does not matter; in fact, a set of frequencies corresponding to a totally separate reference population may be used. The point is that the same set of numbers must be applied to both populations.

For example, if the numbers of infants born in each birthweight category in the developed country are used as the standard and applied to the specific rates in the developing country, we obtain

The crude mortality rate in the developing country would therefore be 15.95 per 1000 (rather than 23.9 per 1000) if the proportions of infant birthweight were distributed as they are in the developed country.

To use this method of adjusting rates, you must know the specific rates for each category in the populations to be adjusted and the frequencies in the reference population for the factor being adjusted. This method is known as the direct method of rate standardization.

Indirect Method of Adjusting Rates

Sometimes specific rates are not available in the populations being compared. If the frequencies of the adjusting factor, such as age or birthweight, are known for each population, and any set of specific rates is available (either for one of the populations being compared or for still another population), an indirect method may be used to adjust rates. The indirect method results in the standardized mortality ratio, defined as the number of observed deaths divided by the number of expected deaths.

To illustrate, suppose the distribution of birthweight is available for both the developed and the developing countries, but we have specific death rates only for another population, denoted the Standard Population in Table 3–12. The expected number of deaths is calculated in each population by using the specific rates from the standard population. For the developed country, the expected number of deaths is

Table 3–12. Infant Mortality Rate Adjustment: Indirect Method.

Table 3–12. Infant Mortality Rate Adjustment: Indirect Method.


 Number of Infants Born (in 1000s) 
BirthweightDeveloped CountryDeveloping CountrySpecific Death Rates per 1000 in Standard Population
< 1500203050.0
1500–2499 g304520.0
2500 g1506510.0
Number of Deaths 24003345 

In the developing country, the expected number of deaths is

The standard mortality ratio (the observed number of deaths divided by the expected number) for the developed country is 2400/3100 = 0.77. For the developing country, the standard mortality ratio is 3345/3050 = 1.1. If the standard mortality ratio is greater than 1, as in the developing country, the population of interest has a mortality rate greater than that of the standard population. If the standard mortality rate is less than 1, as in the developed country, the mortality rate is less than that of the standard population. Thus, the indirect method allows us to make a relative comparison; in contrast, the direct method allows us to make a direct comparison. If rates for one of the populations of interest are known, these rates may be used; then the standardized mortality ratio for this population is 1.0.

Tables & Graphs for Nominal & Ordinal Data

We describe some of the more common methods for summarizing nominal and ordinal data in this section. To illustrate how to construct tables for nominal data, consider the observations on sepsis and catheter culture given in Table 3–13 for the 34 patients with acquired hemophilia (Bossi, 1998) in Presenting Problem 4. The simplest way to present nominal data (or ordinal data, if there are not too many points on the scale) is to list the categories in one column of the table and the frequency (counts) or percentage of observations in another column. Table 3–14 shows a simple way of presenting data for the number of patients who did or did not have hematuria at the time of diagnosis of their hemophilia.

Table 3–13. Data on 34 Patients with Acquired Hemophilia Due to Factor VIII.

Table 3–13. Data on 34 Patients with Acquired Hemophilia Due to Factor VIII.


IDAgeSexEcchymosesHematomaHematuriaFactor VIIIRBC Units >5
170MenYesNoYes5.0Yes
270WomenYesYesYes0.0No
375WomenYesYesNo1.0Yes
493WomenYesYesNo5.0No
569MenNoNoYes5.0No
685MenYesYesNo6.0Yes
780WomenNoNoNo1.0No
826WomenYesYesYes2.5No
933WomenYesYesYes3.5Yes
1081MenYesYesNo1.3No
1142WomenYesYesNo30.0No
1274MenYesYesYes0.0Yes
1355MenYesYesYes3.0Yes
1486WomenYesYesYes3.0No
1571MenYesYesNo6.0No
1689MenYesYesNo5.0Yes
1781WomenYesYesNo0.0Yes
1882WomenYesYesYes1.0No
1982WomenYesYesNo3.0No
2071WomenYesYesYes1.0Yes
2132WomenYesYesYes2.0Yes
2230WomenYesYesNo2.0Yes
2329WomenYesYesNo0.0No
2478MenYesYesNo13.0No
2558MenYesYesYes1.0No
2626WomenYesYesNo0.0Yes
2751MenYesYesNo3.0No
2869MenYesYesNo0.0Yes
2967MenYesYesNo1.0No
3044MenYesYesNo3.0No
3159WomenYesYesNo3.0Yes
3259WomenYesYesNo6.0No
3340MenYesYesYes3.0Yes
3422WomenYesYesNo1.0No

Source: Data, used with permission, from Bossi P, Cabane J, Ninet J, Dhote R, Hanslik T, Chosidow O, et al: Acquired hemophilia due to factor VIII inhibitors in 34 patients. Am J Med 1998;105:400–408.

Table 3–14. Contingency Table for Frequency of Hematuria in Patients with Acquired Hemophilia.

Table 3–14. Contingency Table for Frequency of Hematuria in Patients with Acquired Hemophilia.


HematuriaNumber of Patients
Yes13
No21

Source: Data, used with permission, from Bossi R, Cabane J, Ninet J, Dhote R, Hanslik T, Chosidow O, et al: Acquired hemophilia due to factor VIII inhibitors in 34 patients. Am J Med 1998;105:400–408.

When two characteristics on a nominal scale are examined, a common way to display the data is in a contingency table, in which observations are classified according to several factors. Suppose we want to know the number of men and women who had hematuria at the time of diagnosis. The first step is to list the categories to appear in the table: men with and without hematuria and women with and without hematuria (Table 3–15). Tallies are placed for each patient who meets the criterion. Patient 1 has a tally in the cell "Men with hematuria"; patient 2 has a tally in the cell "Women with hematuria"; and so on. Tallies for the first seven patients are listed in Table 3–15.

Table 3–15. Step 1 in Constructing Contingency Table for Men and for Women with and without Hematuria.

Table 3–15. Step 1 in Constructing Contingency Table for Men and for Women with and without Hematuria.


CategoryTally
Men with hematuria//
Men without hematuria/
Women with hematuria/
Women without hematuria///

Source: Data, used with permission, from Bossi R, Cabane J, Ninet J, Dhote R, Hanslik T, Chosidow O, et al: Acquired hemophilia due to factor VIII inhibitors in 34 patients. Am J Med 1998;105: 400–408.

The sum of the tallies in each cell is then used to construct a contingency table such as Table 3–16, which contains cell counts for all 34 patients in the study. Percentages are often given along with the cell counts.

Table 3–16. Contingency Table for Men and for Women with and without Hematuria.

Table 3–16. Contingency Table for Men and for Women with and without Hematuria.


SexNo HematuriaHematuria
Men96
Women127

Source: Data, used with permission, from Bossi R, Cabane J, Ninet J, Dhote R, Hanslik T, Chosidow O, et al: Acquired hemophilia due to factor VIII inhibitors in 34 patients. Am J Med 1998;105: 400–408.

For a graphic display of nominal or ordinal data, bar charts are commonly used. In a bar chart, counts or percentages of the characteristic in different categories are shown as bars. The investigators in this example could have used a bar chart to present the number of patients with and without hematuria, as illustrated in Figure 3–9. The categories of hematuria (yes or no) are placed along the horizontal, or X-axis, and the number of patients along the vertical, or Y-axis. Bar charts may also have the categories along the vertical axis and the numbers along the horizontal axis.

Other graphic devices such as pie charts and pictographs are often used in newspapers, magazines, and advertising brochures. They are occasionally used in the health field to display such resource information as the portion of the gross national product devoted to health expenditures or the geographic distribution of primary-care physicians.

Describing Relationships between Two Characteristics

Much of the research in medicine concerns the relationship between two or more characteristics. The following discussion focuses on examining the relationship between two variables measured on the same scale when both are numerical, both are ordinal, or both are nominal.

The Relationship Between Two Numerical Characteristics

In Presenting Problem 2, Hébert and colleagues (1997) wanted to estimate the relationship between the scores patients had at different administrations of the Functional Autonomy Measurement System (SMAF). The correlation coefficient (sometimes called the Pearson product moment correlation coefficient, named for the statistician who defined it) is one measure of the relationship between two numerical characteristics, symbolized by X and Y. Table 3–17 gives the information needed to calculate the correlation between the mental function scores at baseline and at the end of 2 years for women 85 years old and older (for the 51 subjects who had both of these measures). The formula for the correlation coefficient, symbolized by r, is

Table 3–17. Calculation for Correlation Coefficient between Mental Ability at Time 1 (X) and Time 3 (Y) for Women Patients 85 Years of Age or Older.a

Table 3–17. Calculation for Correlation Coefficient between Mental Ability at Time 1 (X) and Time 3 (Y) for Women Patients 85 Years of Age or Older.a


PatientX Y (X – )(Y)(X)2
 
(Y)2
 
(X) (Y)
16.00004.00004.41181.902019.46403.61768.3912
20.00000.0000–1.5882–2.09802.52244.40163.3320
31.00000.0000–0.5882–2.09800.34604.40161.2340
222.00003.00000.41180.90200.16960.81360.3714
241.00000.0000–0.5882–2.09800.34604.40161.2340
422.00008.00000.41185.90200.169634.83362.4304
723.00004.00001.41181.90201.99323.61762.6852
1031.00001.0000–0.5882–1.09800.34601.20560.6458
1142.00000.00000.4118–2.09800.16964.4016–0.8640
1210.00002.0000–1.5882–0.09802.52240.00960.1556
1222.00003.00000.41180.90200.16960.81360.3714
1231.00000.0000–0.5882–2.09800.34604.40161.2340
1320.00004.0000–1.58821.90202.52243.6176–3.0208
1510.00000.0000–1.5882–2.09802.52244.40163.3320
1598.00009.00006.41186.902041.111247.637644.2542
1610.00000.0000–1.5882–2.09802.52244.40163.3320
1620.00007.0000–1.58824.90202.522424.0296–7.7854
1730.00000.0000–1.5882–2.09802.52244.40163.3320
1830.00000.0000–1.5882–2.09802.52244.40163.3320
1880.00000.0000–1.5882–2.09802.52244.40163.3320
2200.00001.0000–1.5882–1.09802.52241.20561.7438
2377.00001.00005.4118–1.098029.28761.2056–5.9422
2413.00002.00001.4118–0.09801.99320.0096–0.1384
2510.00002.0000–1.5882–0.09802.52240.00960.1556
2663.00005.00001.41182.90201.99328.42164.0970
2730.00000.0000–1.5882–2.09802.52244.40163.3320
3321.00000.0000–0.5882–2.09800.34604.40161.2340
3470.00001.0000–1.5882–1.09802.52241.20561.7438
3480.00000.0000–1.5882–2.09802.52244.40163.3320
3760.00000.0000–1.5882–2.09802.52244.40163.3320
3773.000012.00001.41189.90201.993298.049613.9796
3960.00000.0000–1.5882–2.09802.52244.40163.3320
4255.00004.00003.41181.902011.64043.61766.4892
4721.00001.0000–0.5882–1.09800.34601.20560.6458
5013.00001.00001.4118–1.09801.99321.2056–1.5502
5181.00003.0000–0.58820.90200.34600.8136–0.5306
5261.00000.0000–0.5882–2.09800.34604.40161.2340
5271.00001.0000–0.5882–1.09800.34601.20560.6458
5312.00000.00000.4118–2.09800.16964.4016–0.8640
5928.000011.00006.41188.902041.111279.245657.0778
6043.00001.00001.4118–1.09801.99321.2056–1.5502
6281.00001.0000–0.5882–1.09800.34601.20560.6458
6340.00000.0000–1.5882–2.09802.52244.40163.3320
6381.00001.0000–0.5882–1.09800.34601.20560.6458
7064.00003.00002.41180.90205.81680.81362.1754
7140.00001.0000–1.5882–1.09802.52241.20561.7438
7220.00000.0000–1.5882–2.09802.52244.40163.3320
7480.00001.0000–1.5882–1.09802.52241.20561.7438
7551.00006.0000–0.58823.90200.346015.2256–2.2952
7920.00000.0000–1.5882–2.09802.52244.40163.3320
7933.00003.00001.41180.90201.99320.81361.2734
Sum81.0000107.00000.00180.002220.3529428.5098179.0588

aValues are reported to four decimal places to minimize round-off error.

Source: Data, used with permission of the authors and the publisher, from Hébert R, Brayne C, Spiegelhalter D: Incidence of functional decline and improvement in a community-dwelling, very elderly population. Am J Epidemiol 1997;145:935–944. Table produced with NCSS; used with permission.

As with the standard deviation, we give the formula and computation for illustration purposes only and, for that reason, use the definitional rather than the computational formula. Using the data from Table 3–17, we obtain a correlation of

Interpreting Correlation Coefficients

What does a correlation of 0.58 between mental functioning at time 1 and 2 years later mean? (Correlations are generally reported to two decimal places.) Chapter 8 discusses methods used to tell whether a statistically significant relationship exists; for now, we will discuss some characteristics of the correlation coefficient that will help us interpret its numerical value.

The correlation coefficient always ranges from –1 to +1, with –1 describing a perfect negative linear (straight-line) relationship and +1 describing a perfect positive linear relationship. A correlation of 0 means no linear relationship exists between the two variables.

Sometimes the correlation is squared (r2) to form a useful statistic called the coefficient of determination or r-squared, and we recommend this practice. For the mental functioning data, the coefficient of determination is (0.58)2, or 0.34. This means that 34% of the variability in one of the measures, such as mental functioning at 2 years, may be accounted for (or predicted) by knowing the value of the other measure, mental functioning at baseline. Stated another way, if we know the value of an elderly woman's score on the mental functioning part of the SMAF and take that into consideration when examining the score 2 years later, the variance (standard deviation squared) of the score after 2 years would be reduced by 34%, or about one-third.

Several other characteristics of the correlation coefficient deserve mention. The value of the correlation coefficient is independent of the particular units used to measure the variables. Suppose two medical students measure the heights and weights of a group of preschool children to determine the correlation between height and weight. They measure the children's height in centimeters and record their weight in kilograms, and they calculate a correlation coefficient equal to 0.70. What would the correlation be if they had used inches and pounds instead? It would, of course, still be 0.70, because the denominator in the formula for the correlation coefficient adjusts for the scale of the units.

The value of the correlation coefficient is markedly influenced by outlying values, just as is the standard deviation. Thus the correlation does not describe the relationship between two variables well when the distribution of either variable is skewed or contains outlying values. In this situation, a transformation of the data that changes the scale of measurement and moderates the effect of outliers (see Chapter 5) or the Spearman correlation can be used.

People first learning about the correlation coefficient often ask, "How large should a correlation be?" The answer depends on the application. For example, when physical characteristics are measured and good measuring devices are available, as in many physical sciences, high correlations are possible. Measurement in the biologic sciences, however, often involves characteristics that are less well defined and measuring devices that are imprecise; in such cases, lower correlations may occur. Colton (1974) gives the following crude rule of thumb for interpreting the size of correlations:

Correlations from 0 to 0.25 (or –0.25) indicate little or no relationship; those from 0.25 to 0.50 (or –0.25 to –0.50) indicate a fair degree of relationship; those from 0.50 to 0.75 (or –0.50 to –0.75) a moderate to good relationship; and those greater than 0.75 (or –0.75) a very good to excellent relationship.

Colton cautions against correlations higher than 0.95 in the biologic sciences because of the inherent variability in most biologic characteristics. When you encounter a high correlation, you should ask whether it is an error or an artifact or, perhaps, the result of combining two populations (illustrated in Chapter 8). An example of an artifact is when the number of pounds patients lose in the first week of a diet program is correlated with the number of pounds they lose during the entire 2-month program.

The correlation coefficient measures only a straight-line relationship; two characteristics may, in fact, have a strong curvilinear relationship, even though the correlation is quite small. Therefore, when you analyze relationships between two characteristics, always plot the data as we do in the section titled, "Graphs for Two Characteristics." A plot will help you detect outliers and skewed distributions.

Finally, "correlation does not imply causation." The statement that one characteristic causes another must be justified on the basis of experimental observations or logical argument, not because of the size of a correlation coefficient.

Use the CD-ROM [available only with the book] and find the correlation between shock index and heart rate using the Kline and colleagues study (2002). Interpret the correlation using the guidelines just described.

The Relationship Between Two Ordinal Characteristics

The Spearman rank correlation, sometimes called Spearman's rho (also named for the statistician who defined it), is frequently used to describe the relationship between two ordinal (or one ordinal and one numerical) characteristics. It is also appropriate to use with numerical observations that are skewed with extreme observations. The calculation of the Spearman rank correlation, symbolized as rs, involves rank-ordering the values on each of the characteristics from lowest to highest; the ranks are then treated as though they were the actual values themselves. Although the formula is simple when no ties occur in the values, the computation is quite tedious. Because the calculation is available on many computer programs, we postpone its illustration until Chapter 8, where it is discussed in greater detail.

The Relationship Between Two Nominal Characteristics

In studies involving two characteristics, the primary interest may be in whether they are significantly related (discussed in Chapter 6) or the magnitude of the relationship, such as the relationship between a risk factor and occurrence of a given outcome. Two ratios used to estimate such a relationship are the relative risk and the odds ratio, both often referred to as risk ratios. For example, in Presenting Problem 3, the investigators may wish to learn whether instruction in domestic violence reduces the "risk" that physicians neglect to screen for this condition. In the context of this discussion, we introduce some of the important concepts and terms that are increasingly used in the medical and health literature, including the useful notion of the number of patients who need to be treated in order to observe one positive outcome.

Experimental and Control Event Rates

Important concepts in the computation of measures of risk are called the event rates. Using the notation in Table 3–18, we are interested in the event of a disease occurring. The experimental event rate (EER) is the proportion of people with the risk factor who have or develop the disease, or A/(A + B). The control event rate (CER) is the proportion of people without the risk factor who have or develop the disease, or C/(C + D).

Table 3–18. Table Arrangement and Formulas for Several Important Measures of Risk.

Table 3–18. Table Arrangement and Formulas for Several Important Measures of Risk.


 DiseaseNo Disease 
Risk factor presentA B A + B 
Risk factor absentC D C + D 
 A + C B + D  

The Relative Risk

The relative risk, or risk ratio, of a disease, symbolized by RR, is the ratio of the incidence in people with the risk factor (exposed persons) to the incidence in people without the risk factor (nonexposed persons). It can therefore be found by dividing the EER by the CER.

The Physicians' Health Study (Steering Committee of the Physicians' Health Study Research Group, 1989) is a classic study undertaken to learn whether aspirin in low doses (325 mg every other day) reduces the mortality from cardiovascular disease. The participants in this clinical trial were 22,071 healthy male physicians who were randomly assigned to receive aspirin or placebo and were evaluated over an average period of 60 months. Table 3–19 gives data on MI for physicians taking aspirin and physicians taking a placebo. Physicians taking aspirin and those taking the placebo were followed to learn the number in each group who had an MI. In this study, taking aspirin assumes the role of the risk factor. The EER is the incidence of MI in physicians who took aspirin, or 139/11,037 = 0.0126; the CER is the incidence of MI in those who took a placebo, or 239/11,034 = 0.0217. The relative risk of MI with aspirin, compared with MI with placebo, is therefore

Table 3–19. Confirmed Cardiovascular End Points in the Aspirin Component of the Physicians' Health Study, According to Treatment Group.

Table 3–19. Confirmed Cardiovascular End Points in the Aspirin Component of the Physicians' Health Study, According to Treatment Group.


 Aspirin GroupPlacebo Group
Number of patients11,03711,034
End Point  
Myocardial infarction  
  Fatal1026
  Nonfatal129213
  Total139239
  Person-years of observation54,560.054,355.7
Stroke  
  Fatal96
  Nonfatal11092
  Total11998
  Person-years of observation54,650.354,635.8

Source: Adapted and reproduced, with permission, from Steering Committee of the Physicians' Health Study Research Group: Final report on the aspirin component of the ongoing Physicians' Health Study. N Engl J Med 1989; 321:129–135.

Because fewer MIs occurred among the group taking aspirin than those taking the placebo, the relative risk is less than 1. If we take the reciprocal and look at the relative risk of having an MI for physicians in the placebo group, the relative risk is 1/0.58 = 1.72. Thus, physicians in the placebo group were 1.7 times more likely to have an MI than physicians in the aspirin group.

The relative risk is calculated only from a cohort study or a clinical trial in which a group of subjects with the risk factor and a group without it are first identified and then followed through time to determine which persons develop the outcome of interest. In this situation, the investigator determines the number of subjects in each group.

Absolute Risk Reduction

The absolute risk reduction (ARR) provides a way to assess the reduction in risk compared with the baseline risk. In the physician aspirin study (see Table 3–19), the experimental event rate for an MI from any cause was 0.0126 in the aspirin group, and the control event rate was 0.0217 in the placebo group. The ARR is the absolute value of the difference between these two event rates:

A good way to interpret these numbers is to think about them in terms of events per 10,000 people. Then the risk of MI is 126 in a group taking aspirin and 217 in a group taking placebo, and the absolute risk reduction is 91 per 10,000 people.

Number Needed to Treat

An added advantage of interpreting risk data in terms of absolute risk reduction is that its reciprocal, 1/ARR, is the number needed to treat (NNT) in order to prevent one event. The number of people that need to be treated to avoid one MI is then 1/0.0091, or 109.9 (about 110 people). This type of information helps clinicians evaluate the relative risks and benefits of a particular treatment. Based on the risks associated with taking aspirin, do you think it is a good idea to prescribe aspirin for 110 people in order to prevent one of them from having an MI? The articles by Glasziou and coworkers (1998) and Sackett and coworkers (2000) contain excellent discussions of this topic; Nuovo and coworkers (2002) discuss the need to include number needed to treat in reports of clinical trials.

Absolute Risk Increase and Number Needed to Harm:

Some treatments or procedures increase the risk for a serious undesirable side effect or outcome. In this situation, the (absolute value of the) difference between the EER and the CER is termed the absolute risk increase (ARI).He and colleagues (1998), in their report of a meta-analysis of randomized trials of aspirin use, found an absolute risk reduction in MI of 137 per 10,000 persons, a result even larger than in the physician aspirin study. They also looked at the outcome of stroke and reported an absolute risk reduction in ischemic stroke of 39 in 10,000. Based on their results, the NNT for the prevention of MI is 1/0.0137, or 72.99 (about 73), and the NNT for the prevention of ischemic stroke is 1/0.0039, or 256.41 (about 257). At the same time, aspirin therapy resulted in an absolute risk increase in hemorrhagic stroke of 12 in every 10,000 persons. The reciprocal of the absolute risk increase, 1/ARI, is called the number needed to harm (NNH). Based on the report by He and colleagues, for hemorrhagic stroke the number needed to harm is 1/0.0012, or 833. Based on these numbers, the authors concluded that the overall benefits from aspirin therapy outweigh the risk for hemorrhagic stroke.

Relative Risk Reduction

A related concept, the relative risk reduction (RRR), is also presented in the literature. This measure gives the amount of risk reduction relative to the baseline risk; that is, the EER minus the CER all divided by the control (baseline) event rate, CER. The RRR in the physician aspirin study is

or approximately 42%. The relative risk reduction tells us that, relative to the baseline risk of 217 MIs in 10,000 people, giving aspirin reduces the risk by 42%.

Many clinicians feel that the absolute risk reduction is a more valuable index than the relative risk reduction because its reciprocal is the number needed to treat. If a journal article gives only the relative risk reduction, it can (fairly easily) be converted to the absolute risk reduction by multiplying by the control event rate, a value that is almost always given in an article. For instance, 0.4194 x 0.0217 is 0.0091, the same value we calculated earlier for the ARR.

The Odds Ratio

The odds ratio provides a way to look at risk in case–control studies. To discuss the odds ratio, we use the study by Ballard and coworkers (1998)d in which the use of antenatal thyrotropin-releasing hormone was studied. Data from this study are given in Table 3–20. The odds ratio (OR) is the odds that a person with an adverse outcome was at risk divided by the odds that a person without an adverse outcome was at risk. The odds ratio is easy to calculate when the observations are given in a 2 x 2 table. The numbers of infants developing respiratory distress syndrome in Table 3–20 are rearranged and given in Table 3–21.

dThe authors could have presented the relative risk because the study was a clinical trial, but they chose to give the odds ratio as well. At one time, use of the odds ratio was generally reserved for case–control studies. One of the statistical methods used increasingly in medicine, logistic regression, can be interpreted in terms of odds ratios. We discuss this method in detail in Chapter 10. We discuss the issue of statistical significance and risk ratios in Chapter 8.

Table 3–20. Outcomes of Infants in the Thyrotropin-Releasing Hormone and Placebo Groups.

Table 3–20. Outcomes of Infants in the Thyrotropin-Releasing Hormone and Placebo Groups.


 Infants at RiskInfants Not at Risk
OutcomeTRH (N = 392) Placebo (N = 377) Odds Ratio (95% CI)TRH (N = 171) Placebo (N = 194) Odds Ratio (95% CI)
Respiratory distress syndrome2602441.1 (0.8–1.5)5130.4 (0.1–1.3)
Death 28 days after delivery43421.0 (0.6–1.6)212.3 (0.1–135)
Chronic lung disease or death 28 days after delivery1751571.1 (0.8–1.3)321.7 (0.2–20.7)

Source: Data, used with permission, from Table 2 in Ballard RA, Ballard PL, Cnaan A, Pinto-Martin J, Davis DJ, Padbury JF, et al: Antenatal thyrotropin-releasing hormone to prevent lung disease in preterm infants. N Engl J Med 1998;338:493–498.

Table 3–21. Data for Odds Ratio for Infants at 32 Weeks or Fewer of Gestation.

Table 3–21. Data for Odds Ratio for Infants at 32 Weeks or Fewer of Gestation.


GroupWith Respiratory DistressWithout Respiratory DistressTotal
TRH260132392
Placebo244133377
Total504265 

Source: Data, used with permission, from Table 2 in Ballard RA, Ballard PL, Cnaan A, Pinto-Martin J, Davis, DJ, Padbury JF, et al: Antenatal thyrotropin-releasing hormone to prevent lung disease in preterm infants. N Engl J Med 1998;338:493–498.

In this study, the odds that an infant with respiratory distress syndrome was exposed to TRH are

and the odds that an infant without respiratory distress syndrome was exposed to TRH are

Putting these two odds together to obtain the odds ratio gives

An odds ratio of 1.1 means that an infant in the TRH group is 1.1 times more likely to develop respiratory distress syndrome than an infant in the placebo group. This risk does not appear to be much greater, and Ballard and coworkers (1998) reported that the odds ratio was not statistically significant.

The odds ratio is also called the cross-product ratio because it can be defined as the ratio of the product of the diagonals in a 2 x 2 table:

In case–control studies, the investigator decides how many subjects with and without the disease will be studied. This is the opposite from cohort studies and clinical trials, in which the investigator decides the number of subjects with and without the risk factor. The odds ratio should therefore be used with case–control studies.

Readers interested in more detail are referred to the very readable elementary text on epidemiology by Fletcher and colleagues (1996). Information on other measures of risk used in epidemiology can be found in Greenberg (2000).

Graphs for Two Characteristics

Most studies in medicine involve more than one characteristic, and graphs displaying the relationship between two characteristics are common in the literature. No graphs are commonly used for displaying a relationship between two characteristics when both are measured on a nominal scale; the numbers are simply presented in contingency tables. When one of the characteristics is nominal and the other is numerical, the data can be displayed in box plots like the one in Figure 3–6 or error plots, as in Figure 3–8.

Also common in medicine is the use of bivariate plots (also called scatterplots or scatter diagrams) to illustrate the relationship between two characteristics when both are measured on a numerical scale. In the study by Hébert and colleagues (1997), information was collected on the mental functioning of each patient at three times, each 1 year apart. Box 3–1 contains a scatterplot of mental functioning scores at times 1 and 3 for women age 85 or older. A scatterplot is constructed by drawing X- and Y-axes; the characteristic hypothesized to explain or predict or the one that occurs first (sometimes called the risk factor) is placed on the X-axis. The characteristic or outcome to be explained or predicted or the one that occurs second is placed on the Y-axis. In applications in which a noncausal relationship is hypothesized, placement for the X- and Y-axes does not matter. Each observation is represented by a small circle; for example, the circle in the lower right in the graph in Box 3–1 represents subject 237, who had a score of 7 at baseline and a score of 1 two years later. More information on interpreting scatterplots is presented in Chapter 8, but we see here that the data in Box 3–1 suggest the possibility of a positive relationship between the two scores. At this point, we cannot say whether the relationship is significant or one that simply occurs by chance; this topic is covered in Chapter 8.

Box 3–1. Illustration of a Scatterplot.

Box 3–1. Illustration of a Scatterplot.


Pearson correlation studies
Pearson Correlations Study
 Time 1Time 3
Time 1 1.0000000.582715
Time 3 0.5827151.000000

Source: Data, used with permission, from Hébert R, Brayne C, Spiegelhalter D: Incidence of functional decline and improvement in a community-dwelling very elderly population. Am J Epidemiol 1997;145:935–944. Plot produced with NCSS; used with permission.

Some of you may notice that fewer data points occur in Box 3–1 than in Table 3–17. This results when several data points have the same value. Both NCSS and SPSS have an option for using "sunflowers," in which each sunflower petal stands for one observation.

Use the CD-ROM [available only with the book] to produce a scatterplot, and choose the sunflower option. Do you think this is helpful in interpreting the plot?

As a final note, there is a correspondence between the size of the correlation coefficient and a scatterplot of the observations. We also included in Box 3–1 the output from NCSS giving the correlation coefficient. Recall that a correlation of 0.58 indicates a moderate to good relationship between the two mental functioning scores. When the correlation is near 0, the shape of the pattern of observations is more or less circular. As the value of the correlation gets closer to +1 or –1, the shape becomes more elliptical, until, at +1 and –1, the observations fall directly on a straight line. With a correlation of 0.58, we expect a scatter plot of the data to be somewhat oval-shaped, as it is in Box 3–1.

Whether to use tables or graphs is generally based on the purpose of the presentation of the data. Tables give more detail and can display information about more characteristics, but they take longer to read and synthesize. Graphs present an easy-to-interpret picture of the data and are appropriate when the goal is to illustrate the distribution or frequencies without specific details.

Examples of Misleading Charts & Graphs

The quality of charts and graphs published in the medical literature is higher than that in similar displays in the popular press. The most significant problem with graphs (and tables as well) in medical journal articles is their complexity. Many authors attempt to present too much information in a single display, and it may take the reader a long time to make sense of it.

The purpose of tables and graphs is to present information (often based on large numbers of observations) in a concise way so that observers can comprehend and remember it more easily. Charts, tables, and graphs should be simple and easily understood by the reader, and concise but complete labels and legends should accompany them.

Knowing about common errors helps you correctly interpret information in articles and presentations. We illustrate four errors we have seen with sufficient frequency to warrant their discussion. We use hypothetical examples and do not imply that they necessarily occurred in the presenting problems used in this text. If you are interested in learning more about table and graph construction, a discussion by Wainer (1992) makes recommendations for designing tables for data. Spirer and colleagues (1998) provide an entertaining discussion of graphs, and Briscoe (1996) has suggestions for improving all types of presentations and posters, as well as publications.

A researcher can make a change appear more or less dramatic by selecting a starting time for a graph, either before or after the change begins. Figure 3–10A shows the decrease in annual mortality from a disease, beginning in 1960 and continuing with the projected mortality through 2010. The major decrease in mortality from this disease occurred in the 1970s. Although not incorrect, a graph that begins in 1980 (Figure 3–10B) deemphasizes the decrease and implies that the change has been small.


If the values on the Y-axis are large, the entire scale cannot be drawn. For example, suppose an investigator wants to illustrate the number of deaths from cancer, beginning in 1960 (when there were 200,000 deaths) to the year 2010 (when 600,000 deaths are projected). Even if the vertical scale is in thousands of deaths, it must range from 200 to 600. If the Y-axis is not interrupted, the implied message is inaccurate; a misunderstanding of the scale makes the change appear larger than it really is. This error, called suppression of zero, is common in histograms and line graphs. Figure 3–11A illustrates the effect of suppression of zero on the number of deaths from cancer per year; Figure 3–11B illustrates the correct construction. The error of suppression of zero is more serious on the Y-axis than on the X-axis, because the scale on the Y-axis represents the magnitude of the characteristic of interest. Many researchers today use computer programs to generate their graphics. Some programs make it difficult to control the scale of the Y-axis (and the X-axis as well). As readers, we therefore need to be vigilant and not be unintentionally misled by this practice.


The magnitude of change can also be enhanced or minimized by the choice of scale on the vertical axis. For example, suppose a researcher wishes to compare the ages at death in a group of men and a group of women. Figure 3–12A, by suppressing the scale, indicates that the ages of men and women at death are similar; Figure 3–12B, by stretching the scale, magnifies the differences in age at death between men and women.

Our final example is a table that gives irrelevant percentages, a somewhat common error. Suppose that the investigators are interested in the relationship between levels of patient compliance and their type of insurance coverage. When two or more measures are of interest, the purpose of the study generally determines which measure is viewed within the context of the other. Table 3–22A shows the percentage of patients with different types of insurance coverage within three levels of patient compliance, so the percentages in each column total 100%. The percentages in Table 3–22A make sense if the investigator wishes to compare the type of insurance coverage of patients who have specific levels of compliance; it is possible to conclude, for example, that 35% of patients with low levels of compliance have no insurance.

Table 3–22. Effect of Calculating Column Percentages versus Row Percentages for Study of Compliance with Medication versus Insurance Coverage.

Table 3–22. Effect of Calculating Column Percentages versus Row Percentages for Study of Compliance with Medication versus Insurance Coverage.


A. Percentages Based on Level of Compliance (Column %)
 Level of Compliance with Medication
Insurance CoverageLowMediumHigh
Medicaid302015
Medicare202530
Medicaid and Medicare555
Other insurance103040
No insurance352010
B. Percentages Based on Insurance Coverage (Row %)
 Level of Compliance with Medication
Insurance CoverageLowMediumHigh
Medicaid453025
Medicare253540
Medicaid and Medicare333333
Other insurance153550
No insurance553015

Contrast this interpretation with that obtained if percentages are calculated within insurance status, as in Table 3–22B, in which percentages in each row total 100%. From Table 3–22B, one can conclude that 55% of patients with no insurance coverage have a low level of compliance. In other words, the format of the table should reflect the questions asked in the study. If one measure is examined to see whether it explains another measure, such as insurance status explaining compliance, investigators should present percentages within the explanatory measure.

Computer Programs

As you have already seen, we give examples of output from computer packages especially designed to analyze statistical data. As much as possible, we reproduce the actual output obtained in analyzing observations from the presenting problems, even though the output frequently contains statistics not yet discussed. We discuss the important aspects of the output, and, for the time being, you can simply ignore unfamiliar information on the printout; in subsequent chapters, we will explain many of the statistics. Statistical computer programs are designed to meet the needs of researchers in many different fields, so some of the statistics in the printouts may rarely be used in medical studies and hence are not included in this book. We use the output from several comprehensive statistical programs in this text, including NCSS, SPSS, and JMP. For the most part, we concentrate on the first two packages. In later chapters we also illustrate programs for estimating the sample size needed for a study.

Summary

This chapter presents two important biostatistical concepts: the different scales of measurement influence the methods for summarizing and displaying information. Some of the summary measures we introduce in this chapter form the basis of statistical tests illustrated in subsequent chapters.

The simplest level of measurement is a nominal scale, also called a categorical, or qualitative, scale. Nominal scales measure characteristics that can be classified into categories; the number of observations in each category is counted. Proportions, ratios, and percentages are commonly used to summarize categorical data. Nominal characteristics are displayed in contingency tables and in bar charts.

Ordinal scales are used for characteristics that have an underlying order. The differences between values on the scale are not equal throughout the scale. Examples are many disease staging schemes, which have four or five categories corresponding to the severity of the disease. Medians, percentiles, and ranges are the summary measures of choice because they are less affected by outlying measurements. Ordinal characteristics, like nominal characteristics, are displayed in contingency tables and bar charts.

Numerical scales are the highest level of measurement; they are also called interval, or quantitative, scales. Characteristics measured on a numerical scale can be continuous (taking on any value on the number line) or discrete (taking on only integer values).

We recommend that the mean be used with observations that have a symmetric distribution. The median, also a measure of the middle, is used with ordinal observations or numerical observations that have a skewed distribution. When the mean is appropriate for describing the middle, the standard deviation is appropriate for describing the spread, or variation, of the observations. The value of the standard deviation is affected by outlying or skewed values, so percentiles or the interquartile range should be used with observations for which the median is appropriate. The range gives information on the extreme values, but alone it does not provide insight into how the observations are distributed.

An easy way to determine whether the distribution of observations is symmetric or skewed is to create a histogram or box plot. Other graphic methods include frequency polygons or line graphs, and error plots. Although each method provides information on the distribution of the observations, box plots are especially useful as concise displays because they show at a glance the distribution of the values. Stem-and-leaf plots combine features of frequency tables and histograms; they show the frequencies as well as the shape of the distri bution. Frequency tables summarize numerical observations; the scale is divided into classes, and the number of observations in each class is counted. Both frequencies and percentages are commonly used in frequency tables.

When measurements consist of one nominal and one numerical characteristic, frequency polygons, box plots, and error plots illustrate the distribution of numerical observations for each value of the nominal characteristic.

The correlation coefficient indicates the degree of the relationship between the two characteristics on the same group of individuals. Spearman's rank correlation is used with skewed or ordinal observations. When the characteristics are measured on a nominal scale and proportions are calculated to describe them, the relative risk or the odds ratio may be used to measure the relationship between two characteristics.

We used data from the study by Kline and colleagues (2002) to illustrate the calculation of common statistics for summarizing data, such as the mean, median, and standard deviation, and to provide some useful ways of displaying data in graphs. The study, conducted in the emergency departments of seven urban hospitals, involved the prospective collection of data from 934 patients who had undergone a pulmonary vascular imaging study (contrast-enhanced CT scan of the chest or a ventilation/perfusion lung scan [V/Q scan]) because of the clinical suspicion of a pulmonary embolism (PE). A final diagnosis of PE was established using a combination of vascular imaging studies plus use of other objective tests and telephone follow-up at 6 months. Two medical students interviewed each patient independently to collect clinical data that were analyzed using multivariate logistic regression analysis (discussed in Chapter 10) to select six variables (age, shock index, unexplained hypoxemia, unilateral leg swelling, recent surgery, and hemoptysis) significantly associated with the presence of PE. They constructed a "decision rule" using the six variables to define a high-risk group of patients with a 40% pretest probability of PE.

Hébert and colleagues (1997) in Presenting Problem 2 focused on disability and functional changes in the elderly. We used observations on subjects 85 years of age or older to illustrate stem-and-leaf plots and box plots. Hébert and colleagues reported that baseline SMAF scores indicated that women were significantly more disabled than men for activities of daily living, mobility, and mental function. Women were more independent in instrumental activities of daily living (housekeeping, meal preparation, shopping, medication use, budgeting). Generally, subjects showed significant declines in all areas of functioning between baseline interview and the second interview 1 year later. Functional decline was associated with age, but not with sex. Interestingly, the functional score declines were not significant (except for a slight decline in instrumental activities of daily living) between the second and third interviews. The authors proposed three explanations to account for this phenomenon: floor effect, survival effect, and regression toward the mean—topics discussed later in this text. Disability is one of the important outcome measures in studies of the elderly population. We also examined the relationship between SMAF scores at baseline and 2 years later in this study and found a moderate to good relationship between these measures.

The results of the study by Lapidus and colleagues (2002) on screening for domestic violence (Presenting Problem 3) were used to illustrate that proportions and percentages can be used interchangeably to describe the relationship of a part to the whole; ratios relate the two parts themselves. When a proportion is calculated over time, the result is called a rate. Some of the rates commonly used in medicine were defined and illustrated. For comparison of rates from two different populations, the populations must be similar with respect to characteristics that might affect the rate; adjusted rates are necessary when these characteristics differ between the populations. In medicine, rates are frequently adjusted for disparities in age. Contingency tables display two nominal characteristics measured on the same set of subjects. Bar charts are an effective way to illustrate nominal data.

Acquired hemophilia is a rare, life-threatening disease caused by development of autoantibodies directed against factor VIII. It is often associated with an underlying disease. In Presenting Problem 4 Bossi and colleagues (1998) studied the characteristics and outcomes of 34 patients who had this disease. The results from their study help clinicians understand the presentation and clinical course of acquired hemophilia. Treatments have included administration of porcine factor VIII, immunosuppressive drugs, and intravenous immuno globulins. These researchers point out the need for randomized, controlled studies of treatment.

The study by Ballard and coworkers (1998) found that the antenatal administration of thyrotropin-releasing hormone had no effect on the pulmonary outcome in these premature infants. No significant differences occurred between the treatment and placebo groups in the incidence of respiratory distress syndrome, death, or chronic lung disease. We used the data to illustrate the odds ratio for the development of respiratory distress, and our results (not significant) agreed with those of the authors. The investigators concluded that treatment with thyrotropin-releasing hormone is not indicated for women at risk of delivering a premature infant.

Exercises

1. Show that the sum of the deviations from the mean is equal to 0. Demonstrate this fact by finding the sum of the deviations for heart rate variation in Table 3–3.

2. In an effort to establish a normative value for heart rate variation to deep breathing (RR_VAR), a noninvasive test used to assess suspected cardiovascular autonomic dysfunction, Gelber and associates (1997) evaluated heart rate variation data from 580 patients in 63 different locations over a period of 15 years. Using the data set in a folder on the CD-ROM [available only with the book] entitled "Gelber" complete the following:

  a. Calculate the mean and standard deviation of heart rate variation (RR_VAR).
  b. Generate a frequency table of VAR for patients using the following categories: 2–20, 21–30, 31–40, 41–50, 51–60, 61–70, 71–80, 81–90, and > 90.
  c. Generate box plots of RR_VAR according to gender.
  d. Normal limits of many laboratory values are set by the 2½ and 97½ percentiles, so that the normal limits contain the central 95% of the distribution. This approach was taken by Gelber and colleagues when they developed norms for mean heart variation to breathing and the Valsalva ratio. Find the normal limits for heart rate variation.

3. Again using the Gelber data set, generate a frequency table of mean, median, minimum, and maximum heart rate variation for patients in age categories. Use the age category column in the data set and the Descriptives procedure in NCSS. There are 490 patients for whom both age and heart rate variation are available. Your table should look like Table 3–23. Estimate the overall mean age of the patients. Estimate the overall mean variation to heart rate. Compare each to the mean calculated with NCSS. Which is more accurate? Why?

Table 3–23. Frequency Table Showing Mean Heart Rate Variation to Deep Breathing Broken Down by 10-Year Age Groups.

Table 3–23. Frequency Table Showing Mean Heart Rate Variation to Deep Breathing Broken Down by 10-Year Age Groups.


  Heart Rate Variation
AgeCountMeanMedianMinimumMaximum
11–205263.057.624.0150.9
21–3016257.656.113.3124.4
31–4014451.148.314.0128.9
41–507839.637.69.4105.5
51–602034.032.37.285.7
61–702329.123.53.370.9
> 701116.817.32.128.3
Total490    

Source: Data, used with permission of the author and publisher, from Gelber DA, Pfeifer M, Dawson B, Shumer M: Cardiovascular autonomic nervous system tests: Determination of normative values and effect of confounding variables. J Auton Nerv Syst 1997;62:40–44. Table produced with NCSS; used with permission.

4. Use the data from the "Bossi" file to form a 2 x 2 contingency table for the frequencies of hematuria in columns and whether patients had RBC units > 5 (gt5rbc in the data file). After you have found the numbers in the cells, use the NCSS program for two proportions (under Analysis and Other) to find the odds ratio.

5. What is the most likely shape of the distribution of observations in the following studies?

  a. The age of subjects in a study of patients with Crohn's disease.
  b. The number of babies delivered by all physicians who delivered babies in a large city during the past year.
  c. The number of patients transferred to a tertiary care hospital by other hospitals in the region.

6. Draw frequency polygons to compare men and women SMAF scores on mental functioning at time 1 in the study by Hébert and coworkers (1997). Repeat for time 2. What do you conclude?

7. The computational formula for the standard deviation is

Illustrate that the value of the standard deviation calculated from this formula is equivalent to that found with the definitional formula using shock index data in Table 3–3. From the section titled, "The Standard Deviation," the value of the standard deviation of shock index using the definitional formula is 0.27. (Use the sums in Table 3–3 to save some calculations.)

8. The following questions give brief descriptions of some studies; you may wish to refer to the articles for more information.

  a. Khalakdina and colleagues (2003) recruited patients with cryptosporidiosis and age-matched controls. Subjects in both groups were interviewed by telephone to obtain information about previous exposures. What statistic is best to summarize their findings?
  b. Brown and coworkers (2003) studied group of 68 adults who were allergic to ant stings; each subject was randomly assigned to receive either venom immunotherapy or a placebo. After a sting challenge, any reactions were recorded. What statistic is best to summarize their findings?
  c. Medical students were asked to assess their competency in performing several cancer screening examinations in a study by Lee and colleagues (2002). What statistic would be best to summarize the average opinions by the students on each competency?
  d. Grodstein and colleagues (2000) examined data from the Nurses' Health Study to study the relationship between duration of postmenopausal hormone therapy and the risk of coronary heart disease in women. What statistics are best to describe the distribution of duration of treatment in those who did and those who did not subsequently experience coronary heart disease? What graphical methods are appropriate?
  e. Kreder and colleagues (2003) studied the effect of provider volume on complication rates after total knee arthroplasty in patients. Low provider volume was related to length of stay in hospital. What graphical method is best to demonstrate the relationship between pro vider volume and complication rate?

9. What measures of central tendency and dispersion are the most appropriate to use with the following sets of data?

  a. Salaries of 125 physicians in a clinic
  b. The test scores of all medical students taking USLME Step I of the National Board Examination in a given year
  c. Serum sodium levels of healthy individuals
  d. Number of tender joints in 30 joints evaluated on a standard examination for disease activity in rheumatoid arthritis patients
  e. Presence of diarrhea in a group of infants
  f. The disease stages for a group of patients with Reye's syndrome (six stages, ranging from 0 = alert wakefulness to 5 = unarousable, flaccid paralysis, areflexia, pupils unresponsive)
  g. The age at onset of breast cancer in females
  h. The number of pills left in subjects' medicine bottles when investigators in a study counted the pills to evaluate compliance in taking medication

10. Examine the pattern of distribution of mean heart rate variation for different age groups in Table 3–23 (Gelber et al, 1997). What do you observe? How would you learn whether or not your hunch is correct?

11. The correlation between age and heart rate variation is–0.45 (Gelber et al, 1997). How do you interpret this value? What are the implications for norms for heart rate variation?

12. Refer to Figure 3–2 to answer the following questions:

  a. What is the mean weight of girls 24 months old?
  b. What is the 90th percentile for head circumference for 12-month-old girls?
  c. What is the fifth percentile in weight for 12-month-old girls?

13. Find the coefficient of variation of mean change in red blood cell units for men and for women using the data from Bossi and colleagues (1998). Does one sex have greater relative variation in the number of red blood cells?

14. Refer to the Physicians' Health Study (Steering Committee of the Physicians' Health Study Research Group, 1989) in Table 3–19 to answer the following questions:

  a. The authors used person-years of observation to calculate the odds ratio. Calculate the relative risk using person-years of observation and compare its value to the value we obtained. Give some reasons for their similarity in magnitude. Under what circumstances could they differ?
  b. The authors also calculated the odds ratio adjusted for age and use of beta-carotene. What do they mean by this statement?
  c. How could the healthy volunteer effect contribute to the finding of no difference in total mortality from cardiovascular causes between the aspirin and placebo group?

15. From their own experiences in an urban public hospital, Kaku and Lowenstein (1990) noted that stroke related to recreational drug use was occurring more frequently in young people. To investigate the problem, they identified all patients between 15 and 44 years of age admitted to a given hospital and selected sex- and age-matched controls from patients admitted to the hospital with acute medical or surgical conditions for which recreational drug abuse has not been shown to be a risk factor. Data are given in Table 3–24. What is the odds ratio?

Table 3–24. Data for Odds Ratio for Stroke with History of Drug Abuse.

Table 3–24. Data for Odds Ratio for Stroke with History of Drug Abuse.


 StrokeControl
Drug Abuse7318
No Drug Abuse141196
  Total 214214

Source: Reproduced, with permission, from Kaku DA, Lowenstein DH: Emergency of recreational drug abuse as a major risk factor for stroke in young adults. Ann Intern Med 1990:113:821–827.

16. Group Exercise. Obtain a copy of the study by Moore and colleagues (1991) from your medical library, and answer the following questions:

a. What was the purpose of this study?
b. What was the study design?
c. Why were two groups of patients used in the study?
d. Examine the box plots in the article's Figure 1. What conclusions are possible from the plots?
e. Examine the box plots in the article's Figure 2. What do these plots tell you about pH levels in normal healthy men?
17. Group Exercise. It is important that scales recommended to physicians for use in assessing risk or making management decisions be shown to be reliable and valid. Select an area of interest, and consult some journal articles that describe scales or decision rules. Evaluate whether the authors presented adequate evidence for the reproducibility and validity of these scales. What kind of reproducibility was established? What type of validity? Are these sufficient to warrant the use of the scale? (For example, if you are interested in assessing surgical risk for noncardiac surgery, you can consult the articles on an index of cardiac risk by Goldman [1995] and Goldman and associates [1977], as well as a follow-up report of an index developed by Detsky and colleagues [1986].)
Copyright © The McGraw-Hill Companies. All rights reserved.
Privacy Notice. Any use is subject to the Terms of Use and Notice.