Central Tendency
First, we're interested in the center of the distribution or "typical" values.
Mean: The mean of a population is termed
and the mean of a sample is denoted12
. One interpretation of the mean is that if you had to give one estimate for someone's cholesterol, the mean would be best. "Best" in the sense that it would minimize the average error13 you'd make in the long run. Outliers (extreme values) may strongly affect the mean.
Note: It's called an arithmetic mean to distinguish it from a geometric mean. The geometric mean is different.
Mean diamond: One JMP-specific feature of box plots is the mean diamond. The center of the diamond is the mean and the extent of the diamond encloses a 95% confidence interval on the mean.
Median: The median is the value that divides the set of data into two parts: Half (50%) are above the median and half below. The median is less affected by extreme values.
12 In JMP terminology, the response variable is y, not x. So, to be consistent with JMP we'll change notation to be different than Daniel (who uses
).
13 We score "errors" by calculating the squared difference between your guess and the true value.