Scatterplots and Regressions (page 3 of 4)
The point of collecting data and plotting the collected values is usually to try to find a formula that can be used to model a (presumed) relationship. I say "presumed" because the researcher may end up concluding that there isn't really any relationship where he'd hoped there was one. For instance, you could run experiments timing a ball as it drops from various heights, and you would be able to find a definite relationship between "the height from which I dropped the ball" and "the time it took to hit the floor". On the other hand, you could collect reams of data on the colors of people's eyes and the colors of their cars, only to discover that there is no discernable connection between the two data sets.
The process of taking your data points and coming up with an equation is called "regression", and the graph of the "regression equation" is called "the regression line". If you're doing your scatterplots by hand, you may be told to find a regression equation by putting a ruler against the first and last dots in the plot, drawing a line, and guessing the line's equation from the picture. This is an incredibly clumsy way to proceed, and can give very wrong answers, especially since values at the ends often turn out to be outliers (numbers that don't quite fit with everything else).
If you're finding regression equations with a ruler, you'll need to work extremely neatly, of course, and using graph paper would probably be a really good idea. Once you've drawn in your line (and this will only work for linear, or straight-line, regressions), you will estimate two points on the line that seem to be close to where the gridlines intersect, and then find the line equation through those two points. From the above graph, I would guess that the line goes close to the points (3, 7) and (19, 1), so the regression equation would be y = (–3/8)x + 65/8.
Most likely, though, you'll be doing regressions in your calculator. Doing regressions properly is a difficult and technical process, but your graphing calculator has been programmed with the necessary formulas and has the memory to crunch the many numbers. The calculator will give you "the" regression line. If you're working by hand, you and your classmates will get slightly different answers; if you're using calculators, you'll all get the same answer. (Consult your owners manual or calculator web sites for specific information on doing regressions with your particular calculator model.)
If you're supposed to report
how "good" a given regression is, then figure out how to find
values in your calculator. These
measure the degree to which the regression equation matches the scatterplot.
The closer these correlation values are to 1
(or to –1),
the better a fit your regression equation is to the data values.
If the correlation value is more than 0.8
or less than –0.8,
the match is judged to be pretty good; if the value is between –0.5
the match is judged to be pretty poor; and a correlation value close to
zero means you're kidding yourself if you think there's really a relationship
of the type you're looking for. (There should be instructions, somewhere
in your owners manual, for finding this information.) When you're doing
a regression, you're trying to find the "best fit" line to the
data, and the correlation numbers help you to tell how good your "fit"
After plugging these values into the STAT utility of my calculator, I can then do a linear regression:
...and a cubic regression: Copyright © Elizabeth Stapel 2005-2011 All Rights Reserved
The line looks a little curvy on the scatterplot, so it's reasonable that the curvy line, the cubic y = 0.000829x3 + 0.23x2 – 1.09x + 24.60, is a better fit to the data points than the straight-line linear model y = 6.03x – 10.64.
Since the correlation value is closer to 1 for the cubic and since the graph of the cubic model is closer to the dots, the cubic equation y = 0.000829x3 + 0.23x2 – 1.09x + 24.60 is the better regression.
You shouldn't expect, by the way, always to get correlation values that are close to "1". If they tell you to find, say, the linear regression equation for a data set, and the correlation factor is close to zero, this doesn't mean that you've found the "wrong" linear equation; it only means that a linear equation probably wasn't a good model to the data. A quadratic model, for instance, might have been better.