More terminology: The top end of your box may also be called the "upper hinge"; the lower end may also be called the "lower hinge". The lower hinge is also called "the 25th percentile"; the median is "the 50th percentile"; the upper hinge is "the 75th percentile". This means that 25%, 50% and 75% of the data, respectively, is at or below that point. The distance between the hinges may be referred to as the "H-spread" or, as you will see on the following page, the "Interquartile Range", abbreviated "IQR". ("Hinge" actually has a different technical definition, but the term is sometimes used informally.)
Also, some books and software
will include the overall median (Q2)
when computing Q1 and
data sets with an odd number of elements. The Texas Instruments calculators
do not include
this case, so you may encounter a book answer that doesn't match the calculator
answer. And different software packages use all different sorts of formulas.
Be careful to use the formula from your book when doing your
Additionally, the box-and-whisker plot may include a cross or an "X" marking the mean value of the data, in addition to the line inside the box that marks the median. The difference between the "X" and the median line can then be used as a measure of "skew".
Please don't ask me to explain "skew".
My first step is to find the median. Since there are eight data points, the median will be the average of the two middle values: (86 + 87) ÷ 2 = 86.5 = Q2
This splits the list into two halves: 77, 79, 80, 86 and 87, 87, 94, 99. Since the halves of the data set each contain an even number of values, the sub-medians will be the average of the middle two values. Copyright © Elizabeth Stapel 2004-2011 All Rights Reserved
= (79 + 80)
÷ 2 = 79.5
The minimum value is 77 and the maximum value is 99, so I have:
min: 77, Q1: 79.5, Q2: 86.5, Q3: 90.5, max: 99
Then my plot looks like this:
As you can see, you only need the five values listed above (min, Q1, Q2, Q3, and max) in order to draw your box-and-whisker plot. This set of five values has been given the name "the five-number summary".
The five-number summary consists of the numbers I need for the box-and-whisker plot: the minimum value, Q1 (the bottom of the box), Q2 (the median of the set), Q3 (the top of the box), and the maximum value (which is also Q4). So I need to order the set, find the median and the sub-medians, and then list the required values in order.
ordering the list: 53, 79, 80, 82, 87, 91, 93, 98, so the minimum is 53 and the maximum is 98
finding the median: (82 + 87) ÷ 2 = 84.5 = Q2
lower half of the list: 53, 79, 80, 82, so Q1 = (79 + 80) ÷ 2 = 79.5
upper half of the list: 87, 91, 93, 98, so Q3 = (91 + 93) ÷ 2 = 92
five-number summary: 53, 79.5, 84.5, 92, 98
Part of the point of a box-and-whisker plot is to show how spread out your values are. But what if one or another of your values is way out of line? For this, we need to consider "outliers"....