Date last modified:
1/8/08
Data Sets Resources Page
 

This page provides links to some interesting data sets that I have collected and used in the teaching of my classes. Unless otherwise noted, each data set is an ASCII comma delimited data set with a ".txt" extension. Such a data set can be imported into NCSS (click on "File" and then "Import"), and other software. There will also be another file with the same name but with an additional "1" at the end of the name and a ".txt" extension. These files are standard ASCII DOS text files, that provide appropriate background information, descriptions of the data, definitions of the variables, and any other pertinent information that might be useful. The link to this descriptive file will always be the "Description" link following the listed data set.

Table of 2000 Random Digits Description   Z Percentiles Data Set Description
A collection of digits arranged in such a manner that there is no correlation among a digit and any of its neighbors. Simply click on the file, it will appear in your browser window, and then you can print it. A data set providing two variables. The first variable provides the values of the Z-score and the second variable provides the cumulative probability for the Z-score (51 pairs of points).
Compressive Strength of Concrete Blocks Data Set Description Average Length of Growing Season Description
Measurements given on the compressive strength of Concrete Blocks given in thousands of pounds per square inch for n=90 observations. The average length of the growing season for n=57 major U.S. cities.
Average Number of Days per Year that had Thunderstorms Description Newcomb's Speed of Light Data Description
The average number of days per year on which one or more thunderstorms occurred for n=81 major US cities. n=66 observations made by Simon Newcomb in 1882 in Washington, D.C.
Homeruns Data Set (entire career) Description Homeruns Data Set (years with the NY Yankees) Description
Homeruns Data Set for Ruth, Mantle, Gehrig, and Maris; number of homeruns per year for their entire career as professional baseball players. Homeruns Data Set for Ruth, Mantle, Gehrig, and Maris; number of homeruns per year for their years as professional baseball players for the New York Yankees.
Misfeeding Leads Data Description Heights of Eleven Year Old School Boys Description
Sheesley's Misfeeding Leads Data: 12 observations each on the Old Method and a New Method. Heights of eleven year old school boys data set with n=1293 observations.
Aldrin & HCB Description Optical Density Measurements Description
Concentration measurements of Aldrin and HCB in the Wolf River in Tennessee. Observations made at two levels of concentration of Interleukin 2 in human sera; an example from the clinical chemistry of in-vitro medical devices.
Lead Concentration Measurements Description Mt. St. Helen Earthquake Data Description
n=64 observations made in 1976 and 1977 in Los Angeles, CA as part of the air pollutants monitoring system.  
Dye Concentration Data Description Description
An example of a single continuous measurement variable that can be used to illustrate the one sample t test with a large sample size.  
Penny Weights Description Mercury Concentration Data Description
The weights in grams of 100 newly minted pennies as reported by W. J. Youden. Observations made on the mercury concentration in the South River, VA at six locations along the river; an example of the One-Way ANOVA and multiple comparisons.
Plankton Data Set Description Fuel Consumption - 10 cars Description
The raw counts and transformed counts of three types of plankton caught in nets. Gas mileage data with n = 10 cars. Variables are the weight in thousands of pounds and gallons of fuel consumed while driving 100 miles.
Hald Portland Cement Data Set Description Description
Four variables that are among the ingredients in Portland cement with the response variable equal to the number of calories of heat generated in the hardening process.  
Anscombe's Four Data Sets Description Fuel Consumption Data Set Description
Frank Anscombe's four data sets that each give the same fitted regression equation. Gas Mileage Data Set with n=38 observations using gallons per 100 miles driven, miles per gallon, weight, and weight^2 as the variables.
Electronic Inverter Data Description Another Fuel Consumption Data Set Description
An example for multiple regression; six independent variables and one dependent variable. Gas Mileage Data Set with n=10 automobiles using gallons per 100 miles driven, horsepower, weight, the product of weight and horsepower, weight^2, and horsepower^2 as the variables.
MPG Data Set Description Gorman-Toman Description
Gas Mileage Data Set with n=32 observations and seven independent variables. Asphalt Pavement Durability Data Set with n=31 observations and six independent variables; data taken from Gorman & Toman (1966).
Starting Salary versus Grade Point Average Description Description
A two by three contingency table that relates starting salaries of newly graduated engineers with their respective graduating grade point averages.  
Piston Ring Control Chart Data Description 2002 General Social Survey Data Set Description
An example of data suitable for using Shewhart Control Charts.


James M. Davenport © 2008