This page provides links to some interesting
data sets that I have collected and used in the teaching of
my classes. Unless otherwise noted, each data set is an ASCII
comma delimited data set with a ".txt" extension. Such a data
set can be imported into NCSS (click on "File" and then "Import"),
and other software. There will also be another file with the
same name but with an additional "1" at the end of the name
and a ".txt" extension. These files are standard ASCII DOS
text files, that provide appropriate background information,
descriptions of the data, definitions of the variables, and
any other pertinent information that might be useful. The
link to this descriptive file will always be the "Description"
link following the listed data set.
Table of 2000 Random Digits |
Description |
|
Z Percentiles
Data Set |
Description |
| A collection
of digits arranged in such a manner that there is no correlation
among a digit and any of its neighbors. Simply click on
the file, it will appear in your browser window, and then
you can print it. |
A data set
providing two variables. The first variable provides the
values of the Z-score and the second variable provides
the cumulative probability for the Z-score (51 pairs of
points). |
Compressive Strength of Concrete Blocks
Data Set |
Description |
Average Length of Growing Season
|
Description |
| Measurements
given on the compressive strength of Concrete Blocks given
in thousands of pounds per square inch for n=90 observations.
|
The average
length of the growing season for n=57 major U.S. cities.
|
Average Number of Days per Year that
had Thunderstorms |
Description |
Newcomb's Speed of Light Data
|
Description |
| The average
number of days per year on which one or more thunderstorms
occurred for n=81 major US cities. |
n=66 observations
made by Simon Newcomb in 1882 in Washington, D.C. |
Homeruns Data Set (entire career) |
Description |
Homeruns Data Set (years with the
NY Yankees) |
Description |
| Homeruns
Data Set for Ruth, Mantle, Gehrig, and Maris; number of
homeruns per year for their entire career as professional
baseball players. |
Homeruns
Data Set for Ruth, Mantle, Gehrig, and Maris; number of
homeruns per year for their years as professional baseball
players for the New York Yankees. |
Misfeeding Leads Data |
Description |
Heights of Eleven Year Old School Boys |
Description |
| Sheesley's
Misfeeding Leads Data: 12 observations each on the Old
Method and a New Method. |
Heights of
eleven year old school boys data set with n=1293 observations. |
Aldrin & HCB |
Description |
Optical Density Measurements |
Description |
| Concentration
measurements of Aldrin and HCB in the Wolf River in Tennessee. |
Observations
made at two levels of concentration of Interleukin 2 in
human sera; an example from the clinical chemistry of
in-vitro medical devices. |
Lead Concentration Measurements |
Description |
Mt. St. Helen
Earthquake Data |
Description |
| n=64 observations
made in 1976 and 1977 in Los Angeles, CA as part of the
air pollutants monitoring system. |
|
Dye
Concentration Data |
Description |
|
Description |
| An example
of a single continuous measurement variable that can be
used to illustrate the one sample t test with a large
sample size. |
|
Penny Weights |
Description |
Mercury Concentration Data |
Description |
| The weights
in grams of 100 newly minted pennies as reported by W.
J. Youden. |
Observations
made on the mercury concentration in the South River,
VA at six locations along the river; an example of the
One-Way ANOVA and multiple comparisons. |
Plankton Data Set |
Description |
Fuel
Consumption - 10 cars |
Description |
| The raw counts
and transformed counts of three types of plankton caught
in nets. |
Gas mileage
data with n = 10 cars. Variables are the weight in thousands
of pounds and gallons of fuel consumed while driving 100
miles. |
Hald Portland Cement Data Set |
Description |
|
Description |
| Four variables
that are among the ingredients in Portland cement with
the response variable equal to the number of calories
of heat generated in the hardening process. |
|
Anscombe's Four Data Sets |
Description |
Fuel Consumption Data Set |
Description |
| Frank Anscombe's
four data sets that each give the same fitted regression
equation. |
Gas Mileage
Data Set with n=38 observations using gallons per 100
miles driven, miles per gallon, weight, and weight^2 as
the variables. |
Electronic Inverter Data |
Description |
Another Fuel Consumption Data Set
|
Description |
| An
example for multiple regression; six independent variables
and one dependent variable. |
Gas Mileage
Data Set with n=10 automobiles using gallons per 100 miles
driven, horsepower, weight, the product of weight and
horsepower, weight^2, and horsepower^2 as the variables. |
MPG
Data Set |
Description |
Gorman-Toman
|
Description |
| Gas
Mileage Data Set with n=32 observations and seven independent
variables. |
Asphalt Pavement
Durability Data Set with n=31 observations and six independent
variables; data taken from Gorman & Toman (1966). |
Starting Salary versus Grade Point
Average |
Description |
 |
Description |
| A
two by three contingency table that relates starting salaries
of newly graduated engineers with their respective graduating
grade point averages. |
|
Piston
Ring Control Chart Data |
Description |
2002
General Social Survey Data Set |
Description |
| An
example of data suitable for using Shewhart Control Charts. |
|