Return to the Purplemath home page

 


powered by FreeFind

 

Print-friendly page

 

 

Box-and-Whisker Plots:
     Quartiles, Boxes, and Whiskers
(page 1 of 2)

Sections: Quartiles, boxes, and whiskers, Interquartile ranges and outliers


Statistics assumes that your data points (the numbers in your list) are clustered around some central value. The "box" in the box-and-whisker plot contains, and thereby highlights, the middle half of these data points.

To create a box-and-whisker plot, you start by ordering your data (putting the values in numerical order), if they aren't ordered already. Then you find the median of your data. The median divides the data into two halves. To divide the data into quarters, you then find the medians of these two halves. Note: If you have an even number of values, so the first median was the average of the two middle values, then you include the middle values in your sub-median computations. If you have an odd number of values, so the first median was an actual data point, then you do not include that value in your sub-median computations. That is, to find the sub-medians, you're only looking at the values that haven't yet been used.

You have three points: the first middle point (the median), and the middle points of the two halves (what I call the "sub-medians"). These three points divide the entire data set into quarters, called "quartiles". The top point of each quartile has a name, being a "Q" followed by the number of the quarter. So the top point of the first quarter of the data points is "Q1", and so forth. Note that Q1 is also the middle number for the first half of the list, Q2 is also the middle number for the whole list, Q3 is the middle number for the second half of the list, and Q4 is the largest value in the list.

Once you have these three points, Q1, Q2, and Q3, you have all you need in order to draw a simple box-and-whisker plot. Here's an example of how it works.

  • Draw a box-and-whisker plot for the following data set:

    4.3,  5.1,  3.9,  4.5,  4.4,  4.9,  5.0,  4.7,  4.1,  4.6,  4.4,  4.3,  4.8,  4.4,  4.2,  4.5,  4.4

    My first step is to order the set. This gives me:

    3.9,  4.1,  4.2,  4.3,  4.3,  4.4,  4.4,  4.4,  4.4,  4.5,  4.5,  4.6,  4.7,  4.8,  4.9,  5.0,  5.1

    The first number I need is the median of the entire set. Since there are seventeen values in this list, I need the ninth value:

    3.9,  4.1,  4.2,  4.3,  4.3,  4.4,  4.4,  4.4,  4.4,  4.5,  4.5,  4.6,  4.7,  4.8,  4.9,  5.0,  5.1

    The median is Q2 = 4.4.

    The next two numbers I need are the medians of the two halves. Since I used the "4.4" in the middle of the list, I can't re-use it, so my two remaining data sets are:

    3.9,  4.1,  4.2,  4.3,  4.3,  4.4,  4.4,  4.4  and   4.5,  4.5,  4.6,  4.7,  4.8,  4.9,  5.0,  5.1

    The first half has eight values, so the median is the average of the middle two:

      Q1 = (4.3 + 4.3)/2 = 4.3

    The median of the second half is:   Copyright © Elizabeth Stapel 2006-2008 All Rights Reserved

      Q3 = (4.7 + 4.8)/2 = 4.75

    Since my list values have one decimal place and range from 3.9 to 5.1, I won't use a scale of, say, zero to ten, marked off by ones. Instead, I'll draw a number line from 3.5 to 5.5, and mark off by tenths.

       

      

      
    my number line

      
    Now I'll mark off the minimum and maximum values, and
    Q1, Q2, and Q3:

      

    min, Q1, median, Q3, and max points marked off

       
     

      
    The "box" part of the plot goes from
    Q1 to Q3:

      

    drawing the 'box'

       
     

    And then the "whiskers" are drawn to the endpoints:

      

    drawing the 'whiskers'

By the way, box-and-whisker plots don't have to be drawn horizontally as I did above; they can be vertical, too.


More terminology: The top end of your box may also be called the "upper hinge"; the lower end may also be called the "lower hinge". The lower hinge is also called "the 25th percentile"; the median is "the 50th percentile"; the upper hinge is "the 75th percentile". This means that 25%, 50% and 75% of the data, respectively, is at or below that point. The distance between the hinges may be referred to as the "H-spread" or, as you will see on the following page, the "Interquartile Range", abbreviated "IQR". ("Hinge" actually has a different technical definition, but the term is sometimes used informally.)

Also, some books and software will include the overall median (Q2) when computing Q1 and Q3 for data sets with an odd number of elements. The Texas Instruments calculators do not include Q2 in this case, so you may encounter a book answer that doesn't match the calculator answer. And different software packages use all different sorts of formulas. Be careful to use the formula from your book when doing your homework!

Additionally, the box-and-whisker plot may include a cross or an "X" marking the mean value of the data, in addition to the line inside the box that marks the median. The difference between the "X" and the median line can then be used as a measure of "skew".

Please don't ask me to explain "skew".


  • Draw the box-and-whisker plot for the following data set:
    77,  79,  80,  86,  87,  87,  94,  99
  • My first step is to find the median. Since there are eight data points, the median will be the average of the two middle values: (86 + 87) ÷ 2 = 86.5 = Q2

    This splits the list into two halves: 77,  79,  80,  86  and  87,  87,  94,  99. Since the halves of the data set each contain an even number of values, the sub-medians will be the average of the middle two values.

      Q1 = (79 + 80) ÷ 2 = 79.5
      Q
      3 = (87 + 94) ÷ 2 = 90.5

    The minimum value is 77 and the maximum value is 99, so I have:

      min: 77, Q1: 79.5, Q2: 86.5, Q3: 90.5, max: 99

    Then my plot looks like this:

      box-and-whisker plot

As you can see, you only need the five values listed above (min, Q1, Q2, Q3, and max) in order to draw your box-and-whisker plot. This set of five values has been given the name "the five-number summary".

  • Give the five-number summary of the following data set:
    79,  53,  82,  91,  87,  98,  80,  93
  • The five-number summary consists of the numbers I need for the box-and-whisker plot: the minimum value, Q1 (the bottom of the box), Q2 (the median of the set), Q3 (the top of the box), and the maximum value (which is also Q4). So I need to order the set, find the median and the sub-medians, and then list the required values in order.

    ordering the list: 53,  79,  80,  82,   87,   91,   93,  98, so the minimum is 53 and the maximum is 98

    finding the median: (82 + 87) ÷ 2 = 84.5 = Q2

    lower half of the list: 53,  79,  80,  82, so Q1 = (79 + 80) ÷ 2 = 79.5

    upper half of the list: 87,  91,  93,  98, so Q3 = (91 + 93) ÷ 2 = 92

      five-number summary: 53,  79.5,  84.5,  92,  98

Part of the point of a box-and-whisker plot is to show how spread out your values are. But what if one or another of your values is way out of line? For this, we need to consider "outliers"....

Top  |  1 | 2  |  Return to Index  Next >>

Cite this article as:

Stapel, Elizabeth. "Box-and-Whisker Plots: Quartiles, Boxes, and Whiskers." Purplemath.
    Available from
http://www.purplemath.com/modules/boxwhisk.htm.
    Accessed
 

 

Lessons index

Lessons CD




Purplemath:
  Linking to this site
  Printing pages
  Donating
  School licensing


Reviews of
Internet Sites:
   Free Help
   Practice
   Et Cetera

The "Homework
   Guidelines"

Study Skills Survey

Tutoring ($$)


This lesson may be printed out for your personal use.

Content copyright protected by Copyscape website plagiarism search
  

  Copyright © 2006-2008  Elizabeth Stapel   |   About   |   Terms of Use

 

 Feedback   |   Error?