Graphical Displays
A distribution of a variable describes the values the variable can take on and their associated probability.
Histograms: For discrete (nominal) variables, there is one bar per value and the length of the bar is proportional to the sample size. For a continuous variable, a grouped histogram collapses values into ranges and shows one bar per range. The choice of the number of bars (the bar width), or the choice for location of the center of the bars is often made by software. However, there is no agreement as to what's best. These choices should be made with an eye to the intent of the figure (that is, what is the figure trying to show?). Be aware that the choice of the center and width of grouped-histogram bars matters to how you perceive the distribution of values.
Box plot: The "box" of the box plot is the range of data between the 25%tile and the 75%tile. Half of the data is in the box; it's drawn larger to emphasize were the middle-values lie. The line in the box is the median. The whiskers extend out to 1.5 times the IQR. The whiskers are meant to draw your eye to the typical range of data values. Values beyond the whiskers are shown as dots and are interpreted as potential outliers.
Figure 1.2
Histogram and Box Plot

The figure above shows a histogram and box plot for continuous data.
Potential outliers: The dots in a box plot show potential outliers in the sense that they have a large distance from the median and so they may, potentially, be in error. Note that not all outliers are errors (and, not all errors are outliers). Even perfectly normal data will have potential outliers.
What to Do with Potential Outliers
Some researchers just delete the outliers. WRONG. Do not remove outliers from your data. |
I assume you already know about the following numerical summaries: