|
The Purplemath Forums |
Scatterplots and Regressions (page 4 of 4) Until (and unless) you get into a statistics class, the preceding pages cover pretty much all there is to scatterplots and regressions. You draw the dots (or enter them into your calculator), you eyeball a line (or find one in the calculator), and you see how well the line fits the dots. About the only other thing you might do is "extrapolate" and "interpolate".
Remember that the point of all this data-collection, dot-drawing, and regression-computing was to try to find a formula that models... whatever it is that they're measuring. You can use these models to try to find missing data points or to try to project into the future (or, sometimes, into the past). If you have data, say, for the years 1950, 1960, 1970, and 1980,
and you find a model for your data, you might use it to guess at values between these dates. For
instance, given Namibian population data for the listed years, you might try to guess the population
of Namibia in 1965. The prefix "inter" means "between", so this guessing-between-the-points
would be interpolation. On the other hand, you might try to work backwards to guess the population
in 1940, or try to fill in the missing data up through 2000. The prefix "extra"
means "outside", so this guessing-outside-the-points would be extrapolation.
Setting my window range as 0 < X < 55, counting by 5's, and 500 < Y < 2000, counting by 250's, my calculator gives me the following scatterplot:
The dots look like they line up in a curve, so I'll try a quadratic regression. The calculator gives me:
As you can see, I've set the calculator to "DiagnosticsOn", so it displays the correlation value whenever I do a regression. This regression looks pretty darned good, especially when it's graphed with the data values:
...so I'll use this model for my computations. Now that I have an equation for modelling Namibia's population, I can use it to estimate the population in the given years. For 1940, I'll use t = –10, since this is ten years before 1950. (This is an extrapolated value, since I'm going outside the data set.) f(–10) = 0.4958(–10)2 + 1.9389(–10) + 538.6993 = 568.8903 For 2005, I'll use t = 55; this will be another extrapolated value. f(55) = 0.4958(55)2 + 1.9389(55) + 538.6993 = 2145.1338 For 1997, I'll use t = 47. Since this value is between known values, this will be an interpolated answer. Copyright © Elizabeth Stapel 2005-2011 All Rights Reserved f(47) = 0.4958(47)2 + 1.9389(47) + 538.6992 = 1725.0498 Remembering that the population values are in thousands, I'll add three zeroes to my numbers and round to get my final answers: The estimated values for the population in 1940 is about 569 000; for 2005, the estimated value is about 2.15 million; and for 1997, the estimated value is about 1.73 million. Depending on your calculator, you may need to memorize what the regression values mean. On my old TI-85, the regression screen would list values for a and b for a linear regression. But I had to memorize that the related regression equation was "a + bx", instead of the "ax + b" that I would otherwise have expected, because the screen didn't say. If you need to memorize this sort of information, do it now, because the teacher will not bail you out if you forget on the test what your calculator's variables mean. << Previous Top | 1 | 2 | 3 | 4 | Return to Index
|
|
|
|
Copyright © 2005-2012 Elizabeth Stapel | About | Terms of Use |
|
|
|
|
|
|